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PATENT APPLICATION 

Attorney Docket No. 15966--556 CIPl (Cura 56 CIPl) 

POLYPEPTIDES AND POLYNUCLEOTIDES ENCODING SAME 

5 RELATED APPLICATIONS 

This application is a continuation-in-part of USSN 09/619252 filed July 19, 2000, which 
claims priority to USSN 60/144,722, filed July 20, 1999, and USSN 60/167,785, filed November 
29, 1999; and is a continuation-in-part of USSN 60/276,994 filed March 19, 2001; USSN 
60/280898 filed April 2, 2001; USSN 60/332,241 filed November 14, 2001; USSN 60/288,062 
10 filed May 2, 2001; USSN 60/291,766 filed May 17, 2001; and USSN 60/314,007 filed August 
21 , 2001 . The contents of these applications are incorporated herein by reference in their 
entireties. 

FIELD OF THE INVENTION 
The invention relates to generally to polynucleotides and the polypeptides encoded 
15 thereby and more particularly to polynucleotides encoding polypeptides that cross one or more 
membranes in eukaryotic cells. 

BACKGROUND OF THE INVENTION 

Eukaryotic cells are subdivided by membranes into multiple, functionally-distinct compartments, 
referred to as organelles. Many biologically important proteins are secreted from the cell after crossing 
20 multiple membrane-bound organelles. These proteins can often be identified by the presence of sequence 
motifs referred to as "sorting signals" in the protein, or in a precursor fonn of the protein. These sorting 
signals can also aid in targeting the proteins to their appropriate destination. 

One specific type of sorting signal is a signal sequence, which is also referred to as a signal 
peptide or leader sequence. This signal sequence, which can be present as an amino-terminal extension 
25 on a newly synthesized polypeptide. A signal sequence possesses the ability to "target" proteins to an 
organelle known as the endoplasmic reticulum (ER). 

The signal sequence takes part in an array of protein-protein and protein-lipid interactions that 
result in the translocation of a signal sequence-containing polypeptide through a channel within the ER. 
Following translocation, a membrane-bound enzyme, designated signal peptidase, liberates the mature 
30 protein from the signal sequence. 

Secreted and membrane-bound proteins are involved in many biologically diverse 
activities. Examples of known, secreted proteins include, e.g., insulin, interferon, interleukin. 



1 



transforming growth factor-P, human growth hormone, erythropoietin, and lymphokine. 
Only a limited number of genes encoding human membrane-bound and secreted proteins have 
been identified. 



Failure to thrive, nutritional edema, and hypoproteinemia with normal sweat electrolytes 

5 of 2 affected male infants reported by Townes et al (J. Pediat. 71 : 220-224, 1967), could be 

treated by a protein hydrolysate diet. Morris and Fisher (Am. J. Dis. Child. 1 14: 203-208, 1967) 
reported an affected female who also had imperforate anus, a result of a defect in the synthesis of 
the enterokinase which activates proteolytic enzymes produced by the pancreas. Oral pancreatin 
represents a therapeutically successful form of enzyme replacement. Trypsin, like elastase is a 

10 member of the pancreatic family of serine proteases. MacDonald et al. (J. Biol. Chem. 257: 
9724-9732, 1982) reported nucleotide sequences of cDNAs representing 2 pancreatic rat 
trypsinogens. The trypsin gene is on mouse chromosome 6 (Honey et al., Somat. Cell Molec. 
Genet. 10: 369-376, 1984). Carboxypeptidase A and trypsin are a syntenic pair conserved in 
mouse and man. Emi et al. (Gene 41 : 305-3 10, 1986) isolated cDNA clones for 2 major human 

15 trypsinogen isozymes from a pancreatic cDNA library. The deduced amino acid sequences had 
89% homology and the same number of amino acids (247), including a 1 5-amino acid signal 
peptide and an 8-amino acid activation peptide. Southern blot analysis of human genomic DNA 
with the cloned cDNA as a probe showed that the human trypsinogen genes constitute a family 
of more than 10. '^The gene encoding trypsin-1 (TRY!) is also referred to as serine protease-1 

20 (PRSSl). Rowen et al. (Science 272: 1755-1762, 1996) found that there are 8 trypsinogen genes 
embedded in the beta T-cell receptor locus or cluster of genes (TCRB) mapping to 7q35. In the 
685-kb DNA segment that they sequenced they found 5 tandemly arrayed 10-kb locus-specific 
repeats (homology units) at the 3-prime end of the locus. These repeats exhibited 90 to 91% 
overall nucleotide similarity, and embedded within each is a trypsinogen gene. Alignment of 

25 pancreatic trypsinogen cDNAs with the germline sequences showed that these trypsinogen genes 
contain 5 exons that span approximately 3.6 kb. They denoted 8 trypsinogen genes Tl through 
T8 from 5-prime to 3-prime. Some of the trypsinogen genes are expressed in nonpancreatic 
tissues where their function is unknown. Rowen et al. (Science 272: 1755-1762, 1996) noted that 
the intercalation of the trypsinogen genes in the TCRB locus is conserved in mouse and chicken, 

30 suggesting shared functional or regulatory constraints, as has been postulated for genes in the 

major histocompatibility complex (such as class I, II, and III genes) that share similar long-term 
organizational relationships. The gene of invention is a novel serine protease containing a trypsin 
domain but locaUzed on chromosome 16. 
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SUMMARY OF THE INVENTION 

The invention is based, in part, upon the discovery of novel nucleic acids and secreted 
polypeptides encoded thereby. The nucleic acids and polypeptides are collectively referred to 
herein as "SECP". 

5 Accordingly, in one aspect, the invention includes an isolated nucleic acid that encodes a 

SECP polypeptide, or a fragment, homolog, analog or derivative thereof For example, the 
nucleic acid can encode a polypeptide at least 85% identical to a polypeptide comprising the 
amino acid sequences of SEQ IDNO:2, 4, 6, 8, 10, 12, 14, 16, 18,41,43,45,47,49,51,53,55 
and 57. The nucleic acid can be, e.g., a genomic DNA fragment, cDNA molecule. In some 

10 embodiments, the nucleic acid includes the sequence the invention provides an isolated nucleic 
acid molecule that includes the nucleic acid sequence of any of SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 
15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56. 

Also included within the scope of the invention is a vector containing one or more of the 
nucleic acids described herein, and a cell containing the vectors or nucleic acids described 
1 5 herein. 

The invention is also directed to host cells transformed with a vector comprising any of 
the nucleic acid molecules described above. 

In another aspect, the invention includes a pharmaceutical composition that includes a 
SECP nucleic acid and a pharmaceutically acceptable carrier or diluent. 

20 In a further aspect, the invention includes a substantially purified SECP polypeptide, e,g., 

any of the SECP polypeptides encoded by a SECP nucleic acid, and fragments, homologs, 
analogs, and derivatives thereof. The invention also includes a pharmaceutical composition that 
includes a SECP polypeptide and a pharmaceutically acceptable carrier or diluent. 

In a still a further aspect, the invention provides an antibody that binds specifically to a 
25 SECP polypeptide. The antibody can be, e.g., a monoclonal or polyclonal antibody, and 
fragments, homologs, analogs, and derivatives thereof The invention also includes a 
pharmaceutical composition including SECP antibody and a pharmaceutically acceptable carrier 
or diluent. The invention is also directed to isolated antibodies that bind to an epitope on a 
polypeptide encoded by any of the nucleic acid molecules described above. 

30 The invention also includes kits comprising any of the pharmaceutical compositions 

described above. 
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The invention further provides a method for producing a SECP polypeptide by providing 
a cell containing a SECP nucleic acid, e.g., a vector that includes a SECP nucleic acid, and 
culturing the cell under conditions sufficient to express the SECP polypeptide encoded by the 
nucleic acid. The expressed SECP polypeptide is then recovered from the cell. Preferably, the 
5 cell produces little or no endogenous SECP polypeptide. The cell can be, e.g., a prokaryotic cell 
or eukaryotic cell. 

The invention is also directed to methods of identifying a SECP polypeptide or nucleic 
acids in a sample by contacting the sample with a compound that specifically binds to the 
polypeptide or nucleic acid, and detecting complex formation, if present. 

10 The invention further provides methods of identifying a compound that modulates the ■ 

activity of a SECP polypeptide by contacting SECP polypeptide with a compound and 
determining whether the SECP polypeptide activity is modified. 

The invention is also directed to compounds that modulate SECP polypeptide activity 
identified by contacting a SECP polypeptide with the compound and determining whether the 
1 5 compound modifies activity of the SECP polypeptide, binds to the SECP polypeptide, or binds to 
a nucleic acid molecule encoding a SECP polypeptide. 

In a another aspect, the invention provides a method of determining the presence of or 
predisposition of a SECP-associated disorder in a subject. The method includes providing a 
sample fi-om the subject and measuring the amount of SECP polypeptide in the subject sample. 

20 The amount of SECP polypeptide in the subject sample is then compared to the amount of SECP 
polypeptide in a control sample. An alteration in the amount of SECP polypeptide in the subject 
protein sample relative to the amount of SECP polypeptide in the control protein sample 
indicates the subject has a tissue proliferation-associated condition. A control sample is 
preferably taken from a matched individual, i.e., an individual of similar age, sex, or other 

25 general condition but who is not suspected of having a tissue proliferation-associated condition. 
Alternatively, the control sample may be taken from the subject at a time when the subject is not 
suspected of having a tissue proliferation-associated disorder. In some embodiments, the SECP 
is detected using a SECP antibody. 

In a further aspect, the invention provides a method of determining the presence of or 
30 predisposition of a SECP-associated disorder in a subject. The method includes providing a 

nucleic acid sample (e.g., RNA or DNA, or both) from the subject and measuring the amount of 
the SECP nucleic acid in the subject nucleic acid sample. The amount of SECP nucleic acid 
sample in the subject nucleic acid is then compared to the amount of a SECP nucleic acid in a 



control sample. An alteration in the amount of SECP nucleic acid in the sample relative to the 
amount of SECP in the control sample indicates the subject has a tissue proliferation-associated 
disorder. 

In a still further aspect, the invention provides method of treating or preventing or 
5 delaying a SECP-associated disorder. The method includes administering to a subject in which 
such treatment or prevention or delay is desired a SECP nucleic acid, a SECP polypeptide, or a 
SECP antibody in an amount sufficient to treat, prevent, or delay a tissue proliferation-associated 
disorder in the subject. 

Unless otherwise defined, all technical and scientific terms used herein have the same 
10 meaning as commonly understood by one of ordinary skill in the art to which this invention 

belongs. Although methods and materials similar or equivalent to those described herein can be 
used in the practice or testing of the invention, suitable methods and materials are described 
below. All publications, patent applications, patents, and other references mentioned herein are 
incorporated by reference in their entirety. In the case of conflict, the present Specification, 
15 including definitions, will control. In addition, the materials, methods, and examples are 
illustrative only and not intended to be limiting. 

Other features and advantages of the invention will be apparent from the following 
detailed description and claims. 

BRIEF DESCRIPTION OF THE FIGURES 

20 FIG. 1 is a representation of a SECP 1 nucleic acid sequence (SEQ ID NO:l) according 

to the invention, along with an amino acid sequence (SEQ ID NO:2) encoded by the nucleic acid 
sequence. 

FIG. 2 is a representation of a SECP 2 nucleic acid sequence (SEQ ID NO:3) according 
to the invention, along with an amino acid sequence (SEQ ID NO:4) encoded by the nucleic acid 
25 sequence. 

FIG. 3 is a representation of a SECP 3 nucleic acid sequence (SEQ ID NO:5) according 
to the invention, along with an amino acid sequence (SEQ ID NO:6) encoded by the nucleic acid 
sequence. 

FIG. 4 is a representation of a SECP 4 nucleic acid sequence (SEQ ID NO:7) according 
30 to the invention, along with an amino acid sequence (SEQ ID NO: 8) encoded by the nucleic acid 
sequence. 
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FIG. 5 is a representation of a SECP 5 nucleic acid sequence (SEQ ID NO:9) according 
to the invention, along with an amino acid sequence (SEQ ID NO: 10) encoded by the nucleic 
acid sequence. 

FIG. 6 is a representation of a SECP 6 nucleic acid sequence (SEQ ID NO:l 1) according 
5 to the invention, along with an amino acid sequence (SEQ ID NO: 12) encoded by the nucleic 
acid sequence. 

FIG. 7 is a representation of a SECP 7 nucleic acid sequence (SEQ ID NO: 13) according 
to the invention, along with an amino acid sequence (SEQ ID NO: 14) encoded by the nucleic 
acid sequence. 

10 FIG. 8 is a representation of a SECP 8 nucleic acid sequence (SEQ ID NO: 1 5) according 

to the invention, along with an amino acid sequence (SEQ ID NO: 16) encoded by the nucleic 
acid sequence. 

FIG. 9 is a representation of a SECP 9 nucleic acid sequence (SEQ ID NO: 17) according 
to the invention, along with an amino acid sequence (SEQ ID NO: 18) encoded by the nucleic 
15 acid sequence. 

FIG. 10 is a representation of an alignment of the proteins encoded by clones 
11618130.0.27 (SEQ ID NO:4) and 1 1618130.0.184 (SEQ ID NO: 16). 

FIG. 11 is a representation of an alignment of the proteins encoded by clones 
14578444.0.143 (SECP4; SEQ ID NO:8) and 14578444.0.47 (SECP 5; SEQ ID NO:10). 
20 FIG. 12 is a representation of a Western blot of a polypeptide expressed in 293 cells of a 

polynucleotide containing sequences encoded by clone 1 1618130. 

FIG. 13 is a representation of a Western blot of a polypeptide expressed in 293 cells of a 
polynucleotide containing sequence encoded by clone 16406477. 

FIG. 14 is a representation of a real-time expression analysis of the clones of the 
25 invention. 

FIG. 15 is a representation of a SECP 10 nucleic acid sequence (SEQ ID NO:40) 
according to the invention, along with an amino acid sequence (SEQ ID NO:41) encoded by the 
nucleic acid sequence. 

FIG. 16 is a representation of a SECP 1 1 nucleic acid sequence (SEQ ID NO:42) 
30 according to the invention, along with an amino acid sequence (SEQ ID NO:43) encoded by the 
nucleic acid sequence. 
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FIG. 17 is a representation of a SECP 12 nucleic acid sequence (SEQ ID NO:44) 
according to the invention, along with an amino acid sequence (SEQ ID NO:45) encoded by the 
nucleic acid sequence. 

FIG. 18 is a representation of a SECP 13 nucleic acid sequence (SEQ ID NO:46) 
5 according to the invention, along with an amino acid sequence (SEQ ID NO:47) encoded by the 
nucleic acid sequence. 

FIG. 19 is a representation of a SECP 14 nucleic acid sequence (SEQ ID NO:48) 
according to the invention, along with an amino acid sequence (SEQ ID NO:49) encoded by the 
nucleic acid sequence. 

10 FIG. 20 is a representation of a SECP 15 nucleic acid sequence (SEQ ID NO:50) 

according to the invention, along with an amino acid sequence (SEQ ID NO:51) encoded by the 
nucleic acid sequence. 

FIG. 21 is a representation of a SECP 16 nucleic acid sequence (SEQ ID NO:52) 
according to the invention, along with an amino acid sequence (SEQ ID NO:53) encoded by the 
1 5 nucleic acid sequence. 

FIG. 22 is a representation of a SECP 17 nucleic acid sequence (SEQ ID NO:54) 
according to the invention, along with an amino acid sequence (SEQ ID NO:55) encoded by the 
nucleic acid sequence. 

FIG. 23 is a representation of a SECP 18 nucleic acid sequence (SEQ ID NO:56) 
20 according to the invention, along with an amino acid sequence (SEQ ID NO:57) encoded by the 
nucleic acid sequence. 

DETAILED DESCMPTION OF THE INVENTION 
The invention provides novel polynucleotides and the polypeptides encoded thereby. 
Included in the invention are ten novel nucleic acid sequences and their encoded polypeptides. 
25 These sequences are collectively referred to as "SECP nucleic acids" or "SECP polynucleotides" 
and the corresponding encoded polypeptide is referred to as a "SECP polypeptide" or "SECP 
protein". For example, a SECP nucleic acid according to the invention is a nucleic acid 
including a SECP nucleic acid, and a SECP polypeptide according to the invention is a 

7 
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polypeptide that includes the amino acid sequence of a SECP polypeptide. Unless indicated 
otherwise, "SECP" is meant to refer to any of the novel sequences disclosed herein. Each of the 
nucleic acid and amino acid sequences have been assigned a unique SECP Identification 
Number, with designations SECPl through SECP 10. 

5 TABLE 1 provides a cross-reference to the assigned SECP Number, Clone or Probe 

Identification Number, and Sequence Identification Number (SEQ ID NO:) for both the nucleic 
acid and encoded polypeptides of SECPl -14. 



TABLE 1 



CLONE/PROBE 


FIGURE 


SEQ ID NO: 
(Nucleic Acid) 


SEQ ID NO: 
(Polypeptide) 


21433858 


1 


1 


2 


11618130.0.27, also 
called CG508 17-03 


2 


3 


4 


11696905-0-47 


3 


5 


6 


14578444.0.143 


4 


7 


8 


14578444.0.47 


5 


9 


10 


14998905.0.65 


6 


11 


12 


16406477.0.206 


7 


13 


14 


11618130.0.184 


8 


15 


16 


21637262.0.64 


9 


17 


18 


CGI 063 18-01 


15 


40 


41 


CG508 17-04 


16 


42 


43 


CG508 17-05 


17 


44 


45 


CG508 17-06 


18 


46 


47 


CG5 1099-03 


19 


48 


49 


CG57051-04 


20 


50 


51 


CG57051-05 


21 


52 


53 


CG57051-02 


22 


54 


55 


CG57051-03 


23 


56 


57 


11618130 Forward 




19 




11618130 Reverse 




20 




PSec-V5-His Forward 




21 




PSec-V5-His Reverse 




22 




16406477 Forward 




23 




16406477 Reverse 




24 




Ag 383 (F) 




25 




Ag383 (R) 




26 




Ag383 (?) 




27 




Ag53 (F) 




28 




Ag53 (R) 




29 




Ag53 (P) 




30 




Ag 127 (F) 




31 




Agl27 (R) 




32 




Agl27 (?) 




33 




Ab 5(F) 




34 




Ab 5(R) 




35 




Ab 5(P) 




36 




Ag 815(F) 




37 
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Ag815(R) 




38 




Ag815(P) 




39 





Nucleic acid sequences and polypeptide sequences for SECP nucleic acids and 
polypeptides, as disclosed herein, are provided in the following section of the Specification. 



SECP nucleic acids, and their encoded polypeptides, according to the invention are useful 
in a variety of applications and contexts. For example, various SECP nucleic acids and 
5 polypeptides according to the invention are useful, inter alia, as novel members of the protein 
families according to the presence of domains and sequence relatedness to previously described 
proteins. 

SECP nucleic acids and polypeptides according to the invention can also be used to 
identify cell types based on the presence or absence of various SECP nucleic acids according to 
10 the invention. Additional utilities for SECP nucleic acids and polypeptides are discussed below. 

SECPl 

A SECPl nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:l) and encoded polypeptide sequence (SEQ ID NO:2) of clone 
21433858. FIG. 1 illustrates the nucleic acid and amino acid sequences, as well as the alignment 
1 5 between these two sequences. 

This clone includes a nucleotide sequence (SEQ ID NO:l) of 6373 bp. The nucleotide 
sequence includes an open reading frame (ORF) encoding a polypeptide of 1588 amino acid 
residues (SEQ ID NO:2) with a predicted molecular weight of 178042.1 Daltons. The start 
codon is located at nucleotides 235-237 and the stop codon is located at nucleotides 4999-5001 . 
20 The protein encoded by clone 21433858 is predicted by the PSORT program to localize in the 

plasma membrane with a certainty of 0.7300. The program SignalP predicts that there is a signal 
peptide with the most probable cleavage site located between residues 23 and 24, in the sequence 
CMG-DE. 

Real-time gene expression analysis was performed on SECPl (clone 21433858). The 
25 results demonstrate that RNA sequences with homology to clone 21433858 are detected in 

various cell types. The relative abundance of RNA homologous to clone 21433858 is shown in 
FIG. 14 (see also Examples, below). Cell types endothelial cells (treated and untreated), 
pancreas, adipose, adrenal gland, thyroid, mammary gland, myometrium, uterus, placenta, 
prostate, testis, and in neoplastic cells derived from ovarian carcinoma OVCAR-3, ovarian 
30 carcinoma OVCAR-5, ovarian carcinoma OVCAR-8, ovarian carcinoma IGROV-1, ovarian 

carcinoma (ascites) SK-OV-3, breast carcinoma BT-549, prostate carcinoma (bone metastases) 
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PC-3, Melanoma MM, and melanoma (met) SK-MEL-5. Accordingly, SECPl nucleic acids 
according to the invention can be used to identify one or more of these cell types. The presence 
of RNA sequences homologous to a SECPl nucleic in a sample indicates that the sample 
contains one or more of the above-cell types. 

5 A search of sequence databases using BLASTX reveals that residues 299- 1 5 88 of the 

polypeptide encoded clone 21433858 are 100% identical to the 1290 residue human KIAA0960 
protein (ACC: SPTREMBL-ACC:Q9UPZ6). In addition, the protein of clone 21433858 has 542 
of 543 residues (99%) identical to, and 543 of 543 residues (100%) positive with, the 543 residue 
fragment of a human hypothetical protein (SPTREMBL-ACC:O60407). 

10 The proteins of the invention encoded by clone 21433858 include the protein disclosed as 

being encoded by the ORF described herein, as well as any mature protein arising therefrom as a 
result of post-translational modifications. Thus, the proteins of the invention encompass both a 
precursor and any active forms of the clone 21433858 protein. 

SECP2 

15 A SECP2 nucleic acid and polypeptide according to the invention includes a nucleic acid 

sequence (SEQ ID NO: 3) and an encoded polypeptide sequence (SEQ ID NO:4) of clone 
1 1618130.0.27, also called CG50817-03. FIG. 2 illustrates the nucleic acid sequence and amino 
acid sequence, as well as the alignment between these two sequences. 

This clone includes a nucleotide sequence (SEQ ID NO:3) of 1894 nucleotides. The 
20 nucleotide sequence includes an open reading frame (ORF) encoding a polypeptide of 267 amino 
acid residues with a predicted molecular weight of 28043 Daltons. The start codon is at 
nucleotides 732-734 and the stop codon is at nucleotides 1534-1536. The protein encoded by 
clone 1 1618130.0.27 is predicted by the PSORT program to localize in the microbody 
(peroxisome) with a certainty of 0.5035. The program SignalP predicts that there is no signal 
25 peptide in the encoded polypeptide. 

A search of the sequence databases using BLAST P and BLASTX reveals that clone 
1 1618130.0.27 has 330 of 333 residues (99%) identical to and positive with a 571 residue human 
protein termed PR0351 (PCT Publication W09946281-A2 published September 16, 1999). In 
addition, it was found to have 83 of 250 residues (33%) identical to, and 1 19 of 250 residues 
30 (47%) positive with the 343 residue human prostasin precursor (EC 3.4.21.-) (SWISSPROT- 
ACC:Q16651). 
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The proteins of the invention encoded by clone 1 1618130.0.27 includes the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
therefrom as a result of post-translational modification. Thus, the protein of the invention 
encompasses both a precursor and any active forms of the 1 1618130.0.27 protein. 

5 SECP3 

A SECP3 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:5) and encoded polypeptide sequence (SEQ ID NO:6) of clone 
1 1696905-0-47. FIG. 3 illustrates the nucleic acid sequence and amino acid sequence, as well as 
the alignment between these two sequences. 

10 Clone 1 1696905-0-47 was obtained from fetal brain. In addition, RNA sequences were 

also found to be present in tissues including, uterus, pregnant and non-pregnant uterus, ovarian 
tumor, placenta, bone marrow, hippocampus, synovial membrane, fetal heart, fetal lung, pineal 
gland and melanocytes. This clone includes a nucleotide sequence of 1855 bp (SEQ ID NO: 5). 
The nucleotide sequence includes an open reading frame (ORF) encoding a polypeptide of 405 

1 5 amino acid residues (SEQ ID NO:6) with a predicted molecular weight of 44750 Daltons. The 
start codon is located at nucleotides 154-1 56 and the stop codon is located at nucleotides 1369- 
1371. The protein encoded by clone 1 1696905-0-47 is predicted by the PSORT program to 
localize extracellularly with a certainty of 0.7332. The program Signal? predicts that there is a 
signal peptide with the most probable cleavage site located between residues 25 and 26, in the 

20 sequence AQG-GP. 

Real-time gene expression analysis was performed on SECP3 (clone 1 1696905-0-47). 
The results demonstrate that RNA sequences homologous to clone 1 1696905-0-47 are detected 
in various cell types. Cell types include adipose, adrenal gland, thyroid, brain, heart, skeletal 
muscle, bone marrow, colon, bladder, liver, lung, mammary gland, placenta, and testis, and in 

25 neoplastic cells derived from renal carcinoma A498, lung carcinoma NCI-H460, and melanoma 
SK-MEL-28. 

Accordingly, SECP3 nucleic acids according to the invention can be used to identify one 
or more of these cell types. The presence of RNA sequences homologous to a SECP3 nucleic in 
a sample indicates that the sample contains one or more of the above-cell types. 

30 A search of the sequence databases using BLASTX reveals that clone 11 696905-0-47 has 

403 of 405 residues (99%) identical to, and 404 of 405 residues (99%) positive with, the 405 
residue human angiopoietin-related protein (SPTREMBL-ACC:Q9Y5B3). Angiopoietin 
homologues are useful to stimulate cell growth and tissue development. The polypeptides of 
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clone 1 1696905-0-47 tend to be found as multimeric proteins (see Example 7) and are believed 
to have angiogenic or hematopoietic activity. They can thus be used in assays for angiogenic 
activity, as well as used therapeutically to stimulate restoration of vascular structure in various 
tissues. Examples of such uses include, but are not limited to, treatment of full-thickness skin 

5 wounds, including venous stasis ulcers and other chronic, non-healing wounds, as well as 

fracture repair, skin grafting, reconstructive surgery, and establishment of vascular networks in 
transplanted cells and tissues. 

The proteins of the invention encoded by clone 1 1696905-0-47 include the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 

10 therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
encompass both a precursor and any active forms of the clone 1 1696905-0-47 protein. 

SECP4 

A SECP4 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:7) and encoded polypeptide sequence (SEQ ID NO:8) of 

1 5 14578444.0.143. FIG. 4 illustrates the nucleic acid sequence and amino acid sequence, as well 
as the alignment between these two sequences. 

Clone 14578444.0.143 was obtained from fetal brain. This clone includes a nucleotide 
sequence (SEQ ID NO:7) of 3026 bp. The nucleotide sequence includes an open reading frame 
(ORF) encoding a polypeptide of 776 amino acid residues (SEQ ID NO: 8) with a predicted 

20 molecular weight of 86220.8 Daltons. The start codon is located at nucleotides 55-57 and the 
stop codon is located at nucleotides 2384-2386. The protein encoded by clone 14578444.0.143 
is predicted by the PSORT program to localize in the endoplasmic reticulum (membrane) with a 
certainty of 0.8200. The program SignalP predicts that there is a signal peptide with the most 
probable cleavage site located between residues 23 and 24 in the sequence AEA-RE. 

25 A search of the sequence databases using BLASTX reveals that clone 14578444.0.143 

has 655 of 757 residues (86%) identical to, and 702 of 757 residues (92%) positive with, the 956 
residue murine matrilin-2 precursor protein (SWISSPROT-ACC:O08746), extending over 
residues 1-754 of the reference protein. Additional similarities are found with lower identities in 
residues 649-837 of the murine protein. Additionally, the search shows that there is a lower 

30 degree of similarity to murine matrilin-4 precursor. The protein of clone 14578444.0.143 also 
has 595 of 606 residues (98%) identical to, and 598 of 606 residues (98%) positive with, the 632 
residue human matrilin-3 (PCT publication WO9904002-A1). 
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The matrilin proteins and polynucleotides can be used for treating a variety of 
developmental disorders (e.g., renal tubular acidosis, anemia, Cushing's syndrome). The proteins 
can serve as targets for antagonists that should be of use in treating diseases related to abnormal 
vesicle trafficking. These may include, but are not limited to, diseases such as cystic fibrosis, 

5 glucose-galactose malabsorption syndrome, hypercholestcrolaemia, diabetes mellitus, diabetes 
insipidus, hyper- and hypoglycemia. Graves disease, goiter, Cushing's disease, Addison*s 
disease, gastrointestinal disorders including ulcerative colitis, gastric and duodenal ulcers, and 
other conditions associated with abnormal vesicle trafficking including AIDS, and allergies 
including hay fever, asthma, and urticaria (hives), autoimmune hemolytic anemia, proliferative 

10 glomerulonephritis, inflammatory bowel disease, multiple sclerosis, myasthenia gravis, 

rheumatoid and osteoarthritis, scleroderma, Chediak-Higashi and Sjogren's syndromes, systemic 
lupus erythematosus, toxic shock syndrome, traumatic tissue damage, and viral, bacterial, 
fungal, helminth, protozoal infections, a neoplastic disorder (e.g., adenocarcinoma, leukemia, 
lymphoma, melanoma, myeloma, sarcoma, teratocarcinoma, and cancers), or an immune 

15 disorder, (e.g., AIDS, Addison's disease, adult respiratory distress syndrome, allergies, anemia, 
asthma, atherosclerosis, bronchitis, cholecystitis, Crohn's disease and ulcerative colitis). 

The proteins of the invention encoded by clone 14578444.0.143 include the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
20 encompass both a precursor and any active forms of the proteins encoded by clone 
14578444.0.143 (SECP4). 

SECP5 

A SECP5 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:9) and encoded polypeptide sequence (SEQ ID NO: 10) of clone 
25 14578444.0.47. FIG. 5 illustrates the nucleic acid sequence and amino acid sequence, as well as 
the alignment between these two sequences. 

Clone 14578444.0.47 was obtained from fetal brain. This clone includes a nucleotide 
sequence (SEQ ID NO:9) of 3447 bp. The nucleotide sequence includes an open reading frame 
(ORF) encoding a polypeptide of 959 amino acid residues (SEQ ID NO: 10) with a predicted 
30 molecular weight of 107144 Dal tons. The start codon is located at nucleotides 55-57 and the 

stop codon is located at nucleotides 2933-2935. The protein encoded by clone 14578444.0.47 is 
predicted by the PSORT program to localize to the endoplasmic reticulum (membrane) with a 



D O y «4 !r3i )& to n fe «i fc! J . ^.J 



certainty of 0.8200. The program Signal? predicts that there is a signal peptide with the most 
probable cleavage site located between residues 23 and 24 in the sequence AEA-RE. 

A search of the sequence databases using BLASTX reveals that clone 14578444.0.47 has 
829 of 959 residues (86%) identical to, and 887 of 959 residues (92%) positive with, the 956 

5 residue murine matrilin-2 precursor protein (ACC: SWISSPROT-ACC:O08746). The protein 
encoded by clone 14578444.0.47 also has 594 of 606 residues (98%) identical to, and 597 of 606 
residues (98%) positive with, the 632 residue human matrilin-3 (PCT publication WO9904002). 
In addition, the protein encoded by clone 14578444.0.47 also has 616 of 678 residues (90%) 
identical to, and 632 of 678 residues (93%) positive with the 915 residue human protein PR0219 

1 0 (PCT publication W099 1 4328-A2). 

The proteins encoded by clones 14578444.0,143 (SECP4) and 14578444.0.47 (SECP5) 
are compared in an amino acid residue alignment shown in FIG. 1 1 . It can be seen that the main 
portion of the two proteins starting with their amino-termini are virtually identical, and that short 
sequences in each corresponding to the carboxyl-terminal sequence of the shorter protein, clone 
15 14578444.0.143, differ from one another. Furthermore, clone 14578444.0.47 has an extended 
carboxyl-terminal sequence that is missing in clone 14578444.0.143. Therefore, clones 
14578444.0.143 (SECP4) and 14578444.0.47 (SECP5) are apparently related to one another as 
splice variants, with respect to their sequences at the carboxyl-terminal ends. 

The matrilin proteins and polynucleotides can be used for treating a variety of 
20 developmental disorders (e.g., renal tubular acidosis, anemia, Cushing's syndrome). The proteins 
can serve as targets for antagonists that should be of use in treating diseases related to abnormal 
vesicle trafficking. These may include, but are not limited to, diseases such as cystic fibrosis, 
glucose-galactose malabsorption syndrome, hypercholesterolaemia, diabetes mellitus, diabetes 
insipidus, hyper- and hypoglycemia. Graves disease, goiter, Cushing's disease, Addison's 
25 disease, gastrointestinal disorders including ulcerative colitis, gastric and duodenal ulcers, and 
other conditions associated with abnormal vesicle trafficking including AIDS, and allergies 
including hay fever, asthma, and urticaria (hives), autoimmune hemolytic anemia, proliferative 
glomerulonephritis, inflammatory bowel disease, multiple sclerosis, myasthenia gravis, 
rheumatoid and osteoarthritis, scleroderma, Chediak-Higashi and Sjogren's syndromes, systemic 
30 lupus erythematosus, toxic shock syndrome, traumatic tissue damage, and viral, bacterial, 

fungal, helminth, protozoal infections, a neoplastic disorder (e.g., adenocarcinoma, leukemia, 
lymphoma, melanoma, myeloma, sarcoma, teratocarcinoma, and cancers), or an immune 
disorder, (e.g., AIDS, Addison's disease, adult respiratory distress syndrome, allergies, anemia, 
asthma, atherosclerosis, bronchitis, cholecystitis, Crohn's disease and ulcerative colitis). 
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The proteins of the invention encoded by clone 14578444.0.47 include the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
encompass both a precursor and any active forms of the proteins encoded by clone 
5 1 4578444.0.47 (SECP5). 

SECP6 

A SECP6 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO: 11) and encoded polypeptide sequence (SEQ ID NO: 12) of clone 
14998905.0.65. FIG. 6 illustrates the nucleic acid sequence and amino acid sequence, as well as 

10 the alignment between these two sequences. 

Clone 14998905.0.65 was obtained from lymphoid tissue, in particular, from the lymph 
node. This clone includes a nucleotide sequence (SEQ ID NO: 1 1) of 967 bp. The nucleotide 
sequence includes an open reading frame (ORF) encoding a polypeptide of 245 amino acid 
residues (SEQ ID NO: 12) with a predicted molecular weight of 27327.2 Daltons. The start 

1 5 codon is located at nucleotides 1 66-1 68 and the stop codon is located at nucleotides 902-904. 
The protein encoded by clone 14998905.0.65 is predicted by the PSORT program to localize in 
the microbody (peroxisome) with a certainty of 0.7480. PSORT predicts that there is no amino- 
terminal signal sequence. Conversely, the program SignalP predicts that there is a signal peptide 
with the most probable cleavage site located between residues 20 and 21, in the sequence GIG- 

20 AE. 

A search of the sequence databases using BLASTX reveals that clone 14998905.0.65 has 
204 of 226 residues (90%) identical to, and 214 of 226 residues (94%) positive with, the 834 
residue murine semaphorin 4C precursor protein (SWISSPROT-ACC:Q64 151). Semaphorin 4C 
is indicated as being a Type I membrane protein widely expressed in the nervous system during 
25 development. In addition, it contains one immunoglobulin-like C2-type domain. The protein 

encoded by clone 14998905.0.65 also has similarities to mouse CDIOO antigen (PCT publication 
W09717368-A1) and to human semaphorin (JP10155490-A). 

The proteins of the invention encoded by clone 14998905.0.65 include the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
30 therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
encompass both a precursor and any active forms of the clone 14998905.0.65 protein. 
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SECP7 

A SECP7 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO: 13) and encoded polypeptide sequence (SEQ ID NO: 14) of clone 
16406477.0.206. FIG. 7 illustrates the nucleic acid sequence and amino acid sequence, as well 
5 as the alignment between these two sequences. 

Clone 16406477.0.206 was obtained from testis. In addition, sequences of clone 
16406477.0.206 were also found in an RNA pool derived from adrenal gland, mammary gland, 
prostate gland, testis, uterus, bone marrow, melanoma, pituitary gland, thyroid gland and spleen. 
This clone includes a nucleotide sequence (SEQ ID NO: 13) comprising of 1359 bp with an open 

10 reading frame (ORF) encoding a polypeptide of 385 amino acid residues (SEQ ID NO: 14) with a 
predicted molecular weight of 43087.3 Daltons. The start codon is located at nucleotides 45-47 
and the stop codon is located at nucleotides 1201-1203. The protein encoded by clone 
16406477.0.206 is predicted by the PSORT program to localize extracellularly with a certainty 
of 0.5804 and to have a cleavable amino-terminal signal sequence. The program SignalP 

1 5 predicts that there is a signal peptide with the most probable cleavage site located between 
residues 39 and 40, in the sequence CWG-AG. 

Real-time expression analysis was performed on SECP7 (clone 16406477.0.206). The 
results demonstrate that RNA homologous to this clone is found in multiple cell and tissue types. 
These cells and tissues include brain, mammary gland, and testis, and in neoplastic cells derived 

20 from ovarian carcinoma OVCAR-3, ovarian carcinoma OVCAR-5, ovarian carcinoma OVCAR- 
8, ovarian carcinoma IGROV-1, breast carcinoma (pleural effusion) T47D, breast carcinoma BT- 
549, melanoma Ml 4. Real-time gene expression analysis was performed on SECP3 (clone 
11 696905-0-47). The results demonstrate that RNA sequences homologous to clone 1 1696905- 
0-47 are detected in various cell types. Cell types include adipose, adrenal gland, thyroid, brain, 

25 heart, skeletal muscle, bone marrow, colon, bladder, liver, lung, mammary gland, placenta, and 
testis, and in neoplastic cells derived from renal carcinoma A498, lung carcinoma NCI-H460, 
and melanoma SK-MEL-28. 

Accordingly, SECP7 nucleic acids according to the invention can be used to identify one 
or more of these cell types. The presence of RNA sequences homologous to a SECP7 nucleic in 
30 a sample indicates that the sample contains one or more of the above-cell types. 

A search of the sequence databases using BLASTX reveals that clone 16406477.0.206 is 
100% identical to a human testis-specific protein TSP50 (SPTREMBL-ACC:Q9UI38) with a 
trypsin/chymotrypsin-like domain. In addition, the protein encoded by clone 16406477.0.206 
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has low similarity to the 343 residue human prostasin precursor (EC 3.4.21.-) (SWISSPROT 
ACC:Q16651). 

The proteins of the invention encoded by clone 16406477.0.206 include the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
5 therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
encompass both a precursor and any active forms of the clone 16406477.0.206 protein. 

SECP8 

A SECP8 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO: 1 5) and encoded polypeptide sequence (SEQ ID NO: 16) of clone 
10 1 1618130.0.184. FIG. 8 illustrates the nucleic acid sequence and amino acid sequence, as well 
as the alignment between these two sequences. 

Clone 1 1618130.0.184 includes a nucleotide sequence (SEQ ID NO: 15) of 1445 bp. The 
nucleotide sequence includes an open reading frame (ORF) encoding a polypeptide of 198 amino 
acid residues (SEQ ID NO: 16) with a predicted molecular weight of 20659 Daltons. The start 
15 codon is located at nucleotides 732-734 and the stop codon is located at nucleotides 1326-1328. 
The protein encoded by clone 1 1618130.0.184 is predicted by the PSORT program to localize in 
the cytoplasm. The program SignalP predicts that there is no signal peptide. 

Clones 11618130.0.184 (SECP8) and 1 1618130.0.27 (SECP2) resemble each other in 
that they are identical over most of their common sequences, and differ only at the carboxyl- 
20 terminal end. In addition, clone 116181 30.0.27 extends further at the carboxyl-terminal end than 
does clone 11618130.0.184. An alignment of clones 1 1618130.0.27 and 11618130.0.184 is 
shown in FIG. 10. 

The proteins of the invention encoded by clone 1 1618130.0.184 include the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
25 therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
encompass both a precursor and any active forms of the 1 1618130.0.184 protein. 

SECP9 

A SECP9 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO: 17) and encoded polypeptide sequence (SEQ ID NO: 18) of clone 
30 21637262.0.64. FIG. 9 illustrates the nucleic acid sequence and amino acid sequence, as well as 
the alignment between these two sequences. 
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Clone 21637262.0.64 was obtained from salivary gland. This clone includes a nucleotide 
sequence (SEQ ID NO: 17) of 1600 bp. The nucleotide sequence includes an open reading frame 
(ORF) encoding a polypeptide of435 amino acid residues (SEQ ID NO: 18) with a predicted 
molecular weight of 47162.5 Daltons. The start codon is located at nucleotides 51-53 and the 
5 stop codon is located at nucleotides 1356-1358. The protein encoded by clone 21637262.0.64 is 
predicted by the PSORT program to localize in the cytoplasm with a certainty of 0.4500. The 
program PSORT and program SignalP predict that the protein appears to have no amino-terminal 
signal sequence. 

Real-time expression analysis was performed on SECP9 (clone 21637262.0.64). The 
10 results demonstrate that RNA homologous to this clone is present in multiple tissue and cell 
types. The relative amounts of RNA in various cell types are shown in FIG. 14 (see also the 
Examples, below). The cells include myometrium, placenta, uterus, prostate, and testis, and 
neoplastic cells derived from breast carcinoma (pleural effusion) T47D, breast carcinoma 
(pleural effusion) MDA-MB-231, breast carcinoma BT-549, ovarian carcinoma OVCAR-3, 
15 ovarian carcinoma OVCAR-5, prostate carcinoma (bone metastases) PC-3, melanoma Ml 4, and 
melanoma LOX IMVI. 

Accordingly, SECP9 nucleic acids according to the invention can be used to identify one 
or more of these cell types. The presence of RNA sequences homologous to a SECP9 nucleic in 
a sample indicates that the sample contains one or more of the above-cell types. 

20 A search of the sequence databases using BLASTX reveals that clone 21637262.0.64 has 

23 of 420 residues (29%) identical to, and 201 of 420 residues (47%) positive with, the 1 130 
residue murine protein repetin (SWISSPROT-ACC:P97347). Repetin is a member of the "fused 
gene" subgroup within the 81 00 gene family that is an epidermal differentiation protein. 

The proteins of the invention encoded by clone 21637262.0.64 include the protein 
25 disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
encompass both a precursor and any active forms of the clone 21637262.0.64 protein. 

SECPIO 

A SECPIO nucleic acid and polypeptide according to the invention includes the nucleic 
30 acid sequence (SEQ ID NO:40 and encoded polypeptide sequence (SEQ ID NO:41) of clone 

CGI 063 18. FIG. 15 illustrates the nucleic acid sequence and amino acid sequences. This clone 
includes a nucleotide sequence (SEQ ID NO:40) of 4810 bp. The nucleotide sequence includes 
an open reading frame (ORF) encoding a polypeptide of 1588 amino acid residues (SEQ ID 
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NO:41). The start codon is located at nucleotides 18-21 and the stop codon is located at 
nucleotides 4782-4785. The protein encoded by clone CGI 063 1 8-01 is predicted by the PSORT 
program to localize in the nucleus with a certainty of 0.3500. The program PSORT and program 
Signal? predict that the protein appears to have no amino-terminal signal sequence. 

5 Real-time expression analysis was performed on SECP 1 0 (clone CG 1 063 1 8). The results 

demonstrate that RNA homologous to this clone is present in multiple tissue and cell types. 

Accordingly, SECP 10 nucleic acids according to the invention can be used to identify 
one or more of these tissue types. The presence of RNA sequences homologous to a SECP 10 
nucleic acid in a sample indicates that the sample contains one or more of the above-tissue types. 

1 0 A search of the sequence databases using BLASTX reveals that clone CG 1 063 1 8 has 

1587 out of 1588 (99.9%) of its residues identical to a human protein utilized in the treatment of 
central nervous system disorders (AAM39295 to HYSEQ INC.). 

The proteins of the invention encoded by clone CGI 063 18-01 include the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
15 therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
encompass both a precursor and any active forms of the clone CGI 063 1 8-01 protein. 

PSORT — Prediction of Protein Translocation Sites version 5.8 
Results Summary: 

plasma membrane Certainty=0 . 7000 (Affirmative) < suco 

20 nucleus Certainty^O . 3 500 (Af f irmat ive) < suco 

microbody (peroxisome) Certainty=0 . 3000 (Affirmative) < suco 

endoplasmic reticulum (membrane) Certainty«0 . 2000 (Affirmative) < suco 

PFAM Domain Analysis 

25 Query: 106318-01 

Scores for sequence family classification (score includes all domains): 

Model Description Score E-value N 

30 tsp 1 Thrombospondin type 1 domain 169 5 5.4e-47 11 

toxin Snake toxin -16,1 1.3 1 

DUF18 Domain of unknown function DUF18 -55.9 7.8 1 

Keratin B2 Keratin, high sulfur B2 protein -81.1 6.6 1 

35 

Sequences producing High-scoring Segment Pairs: Score P(N) N 

gb:GENBANK-ID: AX079870 |acc:AX079870.1 Sequence 1 from Pat 24050 0.0 1 

gb:GENBANK-ID: AB023177 |acc:AB023177.1 Homo sapiens mRNA f.... 19495 0.0 1 

40 gb:GENBANK-ID:AB051466|acc:AB051466.1 Homo sapiens mRNA f. 361 1 5.3e-269 6 

gb:GENBANK-ID:AB006087|acc:AB006087.1 Danio rerio mRNA fo 272 0.16 1 

Qb:GENBANK-ID: AF111298l acc:AF1 11298.1 HIV-1 isolate eur-0 185 0.998 1 



45 



BLASTP: (1588 letters) 

Database: Non-Redundant Composite Protein 

704.847 sequences; 219.724,008 total letters. 
Searching....10....20....30....40....50....60....70....80....90....100% done 



50 Smallest 
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Sequences producing High-scoring Segment Pairs: 



Sum 

High Probability 
Score P(N) 



N 



ptnr.REMTREMBL-ACC:CAC32422 Sequence 1 from Patent WO0105... 8965 0.0 1 
ptnr SPTREMBL-ACC: Q9UPZ6 KIAA0960 PROTEIN - Homo sapiens ... 7298 0.0 1 
ptnr SPTREMBL>ACC: Q9COI4 KIAA1679 PROTEIN - Homo sapiens ... 3983 0.0 1 
ptnr:SPTREMBL-ACC:Q60407 HYPOTHETICAL PROTEIN - Homo sapl...3026 3.1e-315 1 

TABLE 2. BLASTN VERSUS GENBANK COMPOSITE 

Sequences producing High-scoring Segment Pairs: Score P(N) N 

gb:GENBANK:-ID: AX079870 lacc:AX079870.1 Sequence 1 from Pat 24050 0.0 

gb:GENBANK-ID: AB0231 77 |acc: AB023 1 77. 1 Homo sapiens mRNA f... 19495 0.0 

gb:GENBANK-ID: AB051466 |acc:AB05 1466.1 Homo sapiens mRNA f. 361 1 5.3e-269 

gb:GENBANK-ID:AB006087|acc:AB006087.1 Danio redo mRNA fo 272 0.16 

gb:GENBANK-ID: AFl 11298| acc:AFl 1 1298.1 HIV-1 isolate eur-0 185 0.998 
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30 



>9b : GENBANK- ID ;AX07987iO I acc:AX07 9870 . 1 Sequence 1 from Patent WOO 10 5971 - Homo 
sapiens, 6373 bp. (SEQ ID NO:58) 
Length = 6373 



Plus Strand HSPs : 
Score = 24050 (3608.5 bits). Expect = 0.0, P = 0.0 

Identities =» 4810/4810 (100%), Positives = 4810/4810 (100%), Strand 
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218 


35 


Query : 


61 




Sbjct : 


278 


40 


Query: 
Sbjct: 


121 
338 




Query: 


181 


45 


Sbjct: 


398 




Query: 


241 


50 


Sbjct: 

Query : 


458 
301 




Sbjct; 


518 


55 


Query: 


361 




Sbj Ct : 


578 


60 


Query: 
Sbjct: 


421 
638 




Query: 


481 


65 


Sbjct: 


698 




Query : 


541 


70 


Sb3Ct : 

Query : 


758 
601 




Sbj ct : 


818 


75 


Query: 


661 



Plus / Plus 



60 



GTCCATGGGGCCGATGTATGGGAGATGAATGTGGTCCCGGAGGCATCCAAACGAGGGCTG 

IIIMIIiillilillllilMlllllltlMlllllllllilillillllllllillll 

GTCCATGGGGCCGATGTATGGGAGATGAATGTGGTCCCGGAGGCATCCAAACGAGGGCTG 277 

TGTGGTGTGCTCATGTGGAGGGATGGACTACACTGCATACTAACTGTAAGCAGGCCGAGA 120 

llllllllllllllllllllllllllllllllllllilllllllllllilllllllllil 
TGTGGTGTGCTCATGTGGAGGGATGGACTACACTGCATACTAACTGTAAGCAGGCCGAGA 337 

GACCCAATAACCAGCAGAATTGTTTCAAAGTTTGCGATTCGCACAAAGAGTTGTACGACT 180 

llllllillllllllllllitlllllllliDlllllilillllliilllllllllllll 

GACCCAATAACCAGCyVGAATTGTTTCAAAGTTTGTOATTGGCACAAAGAGTTGTACGACT 3 97 

GGAGACTGGGACCTTGGAATCAGTGTCAGCCCGTGATTTCAAAAAGCCTAGAGAAACCTC 240 

IIIIIIIIIIIMIIIIIIIIIIlllllllllllllllllllMlllllillllllllii 

GGAGACTGGGACCTTGGAATCAGTGTCAGCCCX3TGATTTCAAAAAGCCTAGAGAAACCTC 4 57 
TTGAGTGCATTAAGGGGGAAGAAGGTATTCAGGTGAGGGAGATAGCGTGCATCCAGAAAG 300 

IlilillllllllllliliiilllillllllilllllllMlilllllllllllllilll 

TTGAGTGCATTAAGGGGGAAGAAGGTATTCAGGTGAGGGAGATAGCGTGCATCCAGAAAG 5 17 
ACAAAGACATTCCTGCGGAGGATATCATCTGTGAGTACTTTGAGCCCAAGCCTCTCCTGG 360 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 i I i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 n I I I I 

ACAAAGACATTCCTGCGGAGGATATCATCTGTGAGTACTTTGAGCCCAAGCCTCTCCTGG 577 
AGCAGGCTTGCCTCATTCCITCCCAGCAAGATTGCATCGTGTCTCSAATT^ 420 

iiiiiiiintiiiiiiiiiiiiiiiiiiitiiiiiiiiiiiiniiiiiiiitiiiiii 

AGCAGGCTTGCCTCATTCCTTGCCAGCAAGATTGCATCGTGTCTGAATTTTCTGCCTGGT 637 



CCGAATGCTCCAAGACCTGCGGCAGCGGGCTCCAGCACCGGACGCGTCATGTGGTGGCGC 

llllllilllllhllllllllllllKlllllllllllliMIIIMMIIIIMIIII 

CC<3AATGCTCCAAGACCTGCGGCAGCGGGCTCCAGCACCGGACGCX3TCATGTGGTGGCGC 



480 



697 



540 



CCCCGCAGTTCGGAGGCTCTGGCrrGTCCAAACCrrGACGGAGTTCCAGGTGTGCCAATCCA 

llllllllllllilllltlllllUIMIIIIIIIIIIMMIIllllllllllllllll 

CCCCGCAGTTCGGAGGCTCTGGCTGTCCAAACCTGACGGAGTTCCAGGTGTGCCAATCCA 757 
GTCCATGCGAGGCCGAGGAGCTCAGGTACAGCCTGCATGTGGGGCCCTGGAGCACCTGCT 6 GO 

lllllllllllllllllllllliltllllllltMMIIIIIIIIIIIIillllllilll 

GTCCATGCGAGGCCGAGGAGCTCAGGTACAGCCTGCATGTGGGGCCCTGGAGCACCTGCT 817 
CAATGCCCCACTCCCGACAAGTAAGACAAGCAAGGAGACGCGGGAAGAATAAAGAACGGG 660 

lllllllllllllillllliltllllilllllllllllllliltlllllllllilMlli 

CAATGCCCCACTCCCGACAAGTAAGACAAGCAAGGAGACGCGGGAAGAATAAAGAACGGG 877 



AAAAGGACCGCAGCAAAGGAGTAAAGGATCCAGAAGCCCGCGAGCTTATTAAGAAAAAGA 
tllilltilllllllllllllllllllllllillllllllllllllllltllllllllll 

20 



720 
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SbjCt: 

Query: 
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Query : 
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Query : 

SbjCt: 

Query: 

SbjCt: 

Query: 
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878 AAAAGGACCGCAGCAAAGGAGTAAAGGATCCAGAAGCCCGCGAGCTTATTAAGAAAAAGA 937 

721 GAAACAGAAACAGGCAGAACAGACAAGAGAACAAATATTGGGACATCCAGATTGGATATC 7 80 

ItlllllllllllllilllMlilllllMIMIilllllllllllllllllltllllii 

938 GAAACAGAAACAGGCAGAACAGACAAGAG7UVCAAATATTGGGACATCCAGATTGGATATC 997 
781 AGACCAGAGAGGTTATGTGCATTAACAAGACGGGGAAAGCTGCTGATTTAAGCTTTTGCC 84 0 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIlliillMlllilllll 

998 AGACCAGAGAGGTTATGTGCATTAACAAGACGGGGAAAGCTGCTGATTTAAGCTTTTGCC 1057 
841 AGCAAGAGAAGCTTCCAATGACCTTCCAGTCCTGTGTGATCACCAAAGAGTGCCAGGTTT 900 

lilillllllliillilllliMlltiillillliMlllilllllllllllllllltil 

1058 AGCAAGAGAAGCTTCCAATGACCTTCCAGTCCTGTGTGATCACCAAAGAGTGCCAGGTTT 1117 

901 CCGAGTGGTCAGAGTGGAGCCCCTGCTCAAAAACATGCCATGACATGGTGTCCCCTGCAG 960 

llllliltlllMlilllllllllllllllllllllllilllillilMIIMIMilM 

1118 CCGAGTGGTCAGAGTGGAGCCCCTGCTCAAAAACATGCCATGACATGGTGTCCCCTGCAG 1177 
961 GCACTCGTGTAAGGACACGAACCATCAGGCAGTTTCCCATTGGCAGTGAAAAGGAGTGTC 1020 

llllllllllillllillililllllitlilllllillllilllllllllllllllllll 

1178 GCACTCGTGTAAGGACACGAACCATCAGGCAGTTTCCCATTGGCAGTGAAAAGGAGTGTC 1237 

1021 CAGAATTTGAAGAAAAAGAACCCTGTTTGTCTCAAGGAGATGGAGTTGTCCCCTGTGCCA 1080 

illlillitlllllilllllllllllllllllllllllllllllllllllllllllllll 
1238 CAGAATTTGAAGAAAAAGAACCCTGTTTGTCTCAAGGAGATGGAGTTGTCCCCTGTGCCA 1297 

1081 CGTATGGCTGGAGAACTACAGAGTGGACTGAGTGCCGTGTGGACCCTTTGCTCAGTCAGC 114 0 

ililllllllllllllllllililllillllllMIIIIIIIIIMIIIIIIillillll 

1298 CGTATGGCTGGAGAACTACAGAGTGGACTGAGTGCCGTGTGGACCCTTTGCTCAGTCAGC 1357 

1141 AGGACAAGAGGCGCGGCAACCAGACGGCCCTCTGTGGAGGGGGCATCCAGACCCX3AGAGG 1200 

llllllllllinillllllllllllllllllllllllllMIIIIIIIIIIIIIIIIII 
1358 AGGACAAGAGGCGCGGCAACCAGACGGCCCTCTGTGGAGGGGGCATCCAGACCCGAGAGG 1417 

12 01 TGTACTGCGTGCAGGCCAACGAAAACCTCCTCTCACAATTAAGTACCCACAAGAACAAAG 1260 

IIIIMIIIIIIIIIIIIIIIIIIIIIilillinilllllliMMIIillllllllli 

1418 TGTACTGCGTGCAGGCCAACGAAAACCTCCTCTCACAATTAAGTACCCACAAGAACAAAG 1477 

1261 AAGCCTCAAAGCCAATGGACTTAAAATTATGCACTGG ACCTATCCCTAATACTACACAGC 1320 

illlllillllllillllllllllllllllilllllilllllllllllllMllilllll 
1478 AAGCCTCAAAGCCAATGGACTTAAAATTATGCACTGGACCTATCCCTAATACTACACAGC 1537 

1321 TGTGCCACATTCCTTGTCCAACTGAATGTGAAGTTTCACCTTGGTCAGCTTGGGG ACCTT 1380 

lllillllilllllllllllllllilillillllllllilllllllllllllllllllll 

1538 TGTGCCACATTCCTTGTCCAACTGAATGTGAAGTTTCACCTTGGTCAGCTTGGGGACCI^ 1597 
1381 GTACTTATGAAAACTGTAATGATCAGCAAGGGAAAAAAGGCTTCAAACTGAGG AAGCGGC 14 4 0 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIillllilllllllllM 

1598 GTACTTATGAAAACTGTAATGATCAGCAAGGGAAAAAAGGCTTCAAACTGAGGAAGCGGC 1657 
1441 GCATTACCAATGAGCCCACTGGAGGCTCTGGGGTAACCGGAAACTGCCCTCACTTACTGG 1500 

iiiiiiiitiiiiiiiiiiiiiiiiiiiiiiiiMiiiiiiiiiiiiiiiiiiiiiiiii 

1658 GCATTACCAATGAGCCCACTGGAGGCTCTGGGGTAACCGGAAACTGCCCTCACTTACTGG 1717 

1501 AAGCCATTCCCTGTGAAGAGCCTGCCTGTTATGACTGGAAAGCGGTGAG ACTGGGAGACT 1560 

llllllllllllllltlllllllillllllllllillillllilllllllllllllllll 
1718 AAGCCATTCCCTGTGAAGAGCCTGCCTGTTATGACTGGAAAGCGGTGAGACTGGGAGACT 1777 

1561 GCGAGCCAGATAACGGAAAGGAGTGTGGTCCAGGCACGCAAGTTCAAGAGGTTGTGTGCA 1620 

llllllllillllllllllltlllllllllllllllllinillllllllMllllllll 
1778 GCGAGCCAGATAACGGAAAGGAGTGTGGTCCAGGCACGCAAGTTCAAGAGGTTGTGTGCA 1837 

1621 TCAACAGTGATGGAGAAGAAGTTGACAGACAGCTGTGCAGAGATGCCATCTTCCCCATCC 1680 

lllllilllillllllllllliltllllllllllliliilllllllllllltllllilll 
1838 TCAACAGTGATGGAGAAGAAGTTGACAGACAGCTGTGCAG AGATGCCATCTTCCCCATCC 1897 

1681 CTGTGGCCTGTGATGCCCCATGCCCGAAAGACTGTGTGCTCAGCACATGGTCTACGTGGT 1740 

lllllllllllllltMlllililllillilllllltllllllllltlllllllllllll 

1898 CTGTGGCCTGTGATGCCCCATGCCCGAAAGACTGTGTGCTCAGCACATGGTCTACGTGGT 1957 
1741 CCTCCTGCTCACACACCTGCTCAGGGAAAACGACAGAAGGGAAACAGATACGAGCACGAT 1800 

IIIIIIIIMMIIIIIIIillllllllillilltllllllllltllllliliitllllt 

1958 CCTCCTGCTCACACACCTGCTCAGGGAAAACGACAGAAGGGAAACAGATACGAGCACGAT 2017 
1801 CCATTCTGGCCTATGCGGGTGAAGAAGGTGGAATTCGCTGTCCAAATAGCAGTGCTTTGC 1860 

llllllltllllltltlllllllllllillllMllillilllMIIMlllllllllll 

2018 CCATTCTGGCCTATGCGGGTGAAGAAGGTGGAATTCGCTGTCCAAATAGCAGTGCTTTGC 2077 



21 





Query: 


1861 




Sb j Ct : 


2078 


5 


Query: 


1921 




Sbjct : 


2138 


10 


Query ; 


1981 




Sb j CC : 


2198 




Query: 


2041 


15 


Sb j CC : 


22S8 




Query : 


2101 


20 


Sbjct: 


2318 




Query : 


2 161 




Sbjct : 


2378 


25 


Query: 


2221 




Sb j ct : 


2438 


30 


Query: 


2281 




Sbjct : 


2498 




Query: 


2341 


35 


Sbjct: 


2558 




Query : 


2401 


40 


Sbjct: 


2616 




Query: 


2461 




Sbjct: 


2678 


45 


Query: 


2521 




Sbjct: 


2738 


50 


Query : 


2581 




Sbjct : 


2798 




Query : 


2641 


55 


Sbjct: 


2858 




Query : 


2701 


60 


Sbjct: 


2918 




Query : 


2761 




Sbjct: 


2978 


65 


Query : 


2821 




Sbjct : 


3038 


70 


Query: 


2881 




Sbjct: 


3098 




Query: 


2941 


75 


Sbjct: 


3158 




Query : 


3001 



AAGAAGTACGAAGCTGTAATGAGCATCCTTGCACAGTGTACCACTGGCAAACrGGTCCCT 1920 

IlilllllllllliilllllllllilllllllDlllllltillllilllllllllllll 
AAGAAGTACGAAGCTGTAATGAGCATCCTTGCACAGTGTACCACTGGCAAACTGGTCCCT 2137 

GGGGCCAGTGCATTGAGGACACCTCAGTATCGTCCTTCAACACAACTACGACTTGGAATG 198 0 

MlllilllllllllllllllMllllllllllilllllllllllllllllllllMIII 

GGGGCCAGTGCATTGAGGACACCTCAGTATCGTCCTTCAACACAACTACGACTTGGAATG 2197 



llllillillllllllllilllllllllllllllllllilllllillllllllllillll 



2040 



2257 



2100 



Mllllillllllllllllllliilllillilllllillllllilllllllliillllll 

TGGGCCAAGTGGGACCCAAAAAATGTCCTGAAAGCCTTCGACCTGAAACTGTAAGGCCTT 2317 
GTCTGCTTCCTTGTAAGAAGGACTGTATTGTGACCCCATATAGTGACTGGACATCATGCC 2160 

iiiiiiillillllllllllllllillllliliilllllllllliliillllllliilii 

GTCTGCTTCCTTGTAAGAAGGACTGTATTGTGACCCCATATAGTGACTGGACATCATGCC 2377 
CCTCTTCGTGTAAAGAAGGGGACTCCAGTATCAGGAAGCAGTCTAGGCATCGGGTCATCA 2220 

illllllllilllillMlllillMllllllllillilillillilililllllllMI 

CCTCTTCGTGTAAAGAAGGGGACTCCAGTATCAGGAAGCAGTCTAGGCATCGGGTCATCA 2437 



liiiiiiiliilillliliililiilllltiiiiiililliliiiiiiiiiiiiiiiiit 



2280 



llllllllllllllllllllililllllllilllllllllllllllllillllliiliil 



2340 



llllllllllllllllllllllllllllilllllllllllllllllllllMlllllill 



2400 



2617 



llliillllllllilllilllllllltlllllllllllllllllllllllllllllllli 



2460 



lillllilllllilllilllllllllilllliililllilllllllllllllllllll 



2677 



2520 



Mill 



Mini 



MMIIIIMIIIMillllMMIMMIIIIIIIIIIIIIM 



CAGTTAGGACCAGAAAGCGCACTCTTGTTGGAAAAAGTAAAAAGAAGGAAAAATGTAAAA 264 0 

IIIIIIIIIIIIIIMIIIIMMMIIMIIMMMIMMIIIIIIMIIMIIMI 

CAGTTAGGACCAGAAAGCGCACTCTTGTTGGAAAAAGTAAAAAGAAGGAAAAATGTAAAA 2857 
ATTCCCATTTGTATCCCCTGATTGAGACTCAGTATTGTCCTTGTGACAAATATAATGCAC 2700 

MMIIIIIIIIMMIIIIIIMIIIIIMIMMMIIIMMIIIIIIIIIIIIMl 

ATTCCCATTTGTATCCCCTGATTGAGACTCAGTATTGTCCTTGTGACAAATATAATGCAC 2917 

AACCTGTGGGGAACTGGTCAGACTGTATTTTACCAGAGGGAAAAGTGGAAGTGTTGCTGG 2760 

IIIIIMiMllllllllllllllllllllllllllllllllilMIIIIIIIIIIIMI 



'IIIIIIIIIIMIMI illM lllllllllllllllllilMIIIIIIIIIIIIIIIII 



CATGCTACGATCAAAATGGCAGGCTTGTGGAAACATCTAGATGTAACAGCCATGGTTACA 288 0 

IMMIIIMIMIIIIMIMIIIIIIIIIIMIIMMMIIIMMMMIIIMII 



2940 



IIIIIIIMIIIIIIIIIIIIIIIIIIIIIIMIIIII lllilllll II IIMIIIIill 

TTGAGGAGGCCTGCT^TCATCCCCTGCCCCTCAGACTGCAAGCTCAGTGAGTGGTCCAACT 3 157 

GGTCGCGCTGCAGCAAGTCCTGTGGGAGTGGTGTGAAGGTTCGTTCTAAATGGCTGCGTG 3000 

IIIIIIIIIIIIIIIIIMIIIIIIIMIIillllllllMIIIIIIIIIMIIIIIIil 

GGTCGCGCTGCAGCAAGTCCTGTGGGAGTGGTGTGAAGGTTCX3TTCTAAATGGCTGCGTG 3217 



MlilllMlilMIIMIIIIIMIIIIIIIIIMIIIIIIMIIIIIMlllMIII 



22 



IJ Li ^ ¥"'"s to n, O J. „:5 a « rf 





SbjCt : 


3218 




Query: 


3061 


5 


SbjCt : 


3278 




Query : 


3121 


10 


SbjCt: 


3336 




Query : 


3181 




SbjCt: 


3398 


15 


Query: 


3241 




Sb j ct : 


3458 


20 


Query : 


3301 




Sbjct : 


3518 




Query : 


3361 


25 


Sbjct: 


3578 




Query: 


3421 


30 


Sbjct: 


3636 




Query: 


3481 




Sbjct : 


3698 


35 


Query : 


3541 




Sbjct : 


3758 


40 


Query: 


3601 




Sbjct : 


3818 




Query: 


3661 


45 


Sbjct: 


3878 




Query: 


3721 


50 


Sbjct: 


3938 




Query : 


3781 




Sbjct: 


3998 


55 


Query : 


3841 




Sb j ct : 


4056 


60 


Query: 


3901 




Sbjct : 


4118 




Query: 


3961 


65 


Sbjct : 


4178 




Query: 


4021 


70 


Sbjct : 


4238 




Query: 


4081 




Sbjct : 


4298 


75 


Query: 


4141 




Sbjct: 


4358 



3218 AAAAACCATATAATGGAGGAAGGCCTTGCCCCAAACTGGACCATGTCAACCAGGCACAGG 3277 

TGTATGAGGTTGTCCCATGCCACAGTGACTGCAACCAGTACCTATGGGTCACAGAGCCCT 3120 

IlillillilllllllllliillllllllillllllMlllilillMMIIilllllll 

TGTATGAGGTTGTCCCATGCCACAGTGACTGCAACCAGTACCTATGGGTCACAGAGCCCT 3337 
GGAGCATCTGCAAGGTGACCTTTGTGAATATGCGGGAGAACTGTGGAGAGGGCGTGCAAA 3180 

ItltllllllllllliilllllillllllllltlliltilllllMMIitMMIiili 

GGAGCATCTGCAAGGTGACCTTTGTGAATATGCGGGAGAACTGTGGAGAGGGCGTGCAAA 3397 
CCCGAAAAGTGAG ATGCATGCAG AATACAGCAGATGGCCCTTCTGAACATGTAGAGGATT 3240 

lliillllillillllllillllillllillllllllllltllliililiiiMiiiill 

CCCGAAAAGTGAGATGCATGCAGAATACAGCAGATGGCCCTTCTGAACATGTAGAGGATT 3457 

ACCTCTGTGACCCAGAAGAGATGCCCCTGGGCTCTAGAGTGTGCAAATTACCATGCCCTG 3300 

IMMIIIIIIIIIIillllllllllllllllillllillllllllllllMIIMMM 

ACCTCTGTGACCCAGAAGAGATGCCCCTGGGCTCTAGAGTGTGCAAATTACCATGCCCTG 3517 

AGGACTGTGTGATATCTGAATGGGGTCCATGG ACCCAATGTGTTTTGCCTTGCAATCAAA 3360 

llllllllllllilllliillllllllllllllillllllllllllllilllllilllll 
AGGACTGTGTG ATATCTG AATGGGGTCCATGGACCCAATGTGTTTTGCCTTGCAATCAAA 3577 



iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiMiiiitiiiiiiiiiiiiiii 

GOVGTTTCCGGCAAAGGTCAGCTGATCCCATCAGACAACCAGCTGATGAAGGAAGATCTT 



3637 



GCCCTAATGCTGTTGAGAAAGAACCCTGTAACCTGAACAAAAACTGCTACCACTATGATT 3480 

GCCCTAATGCTGTTGAGAAAGAACCCTGT AACCTGAACAAAAACTGCT ACCACTATGATT 3697 

ATAATGTAACAGACTGGAGTACATGTGAGCTOAGTGAGAAGGCAGTTTGTGGAAATGGAA 3540 

lllllllillllllllllMMIIIIIIIIIIIIIIDMIIDtillllllllllllll 

ATAATGTAACAGACTGGAGTACATGTCAGCTGAGTGAG AAGGCAGTTTGTGGAAATGGAA 3757 

TAAAAACAAGGATGTTGGATTGTGTTCGAAGTGATGGCAAGTCAGTTGACCTGAAATATT 360 0 

illiiiiiiiiiiililliiilllliiliiilllliiiiiiiiiiiiiiiiiiiiiiiii 

TAAAAACAAGGATGTTGGATTGTGTTCGAAGTGATGGCAAGTCAGTTGACCTG AAATATT 3817 

GTGAAGCGCTTGGCTTGGAGAAGAACTGGCAGATGAACACGTCCTGCATGGTGGAATGCC 3660 

llllllilillllllllilllllllliillllllllitMllilillllllltlllllil 

GTGAAGCGCTTGGCTTGGAGAAGAACTGGCAGATGAACACGTCCTGCATGGTGGAATC 3877 

CTGTGAACTGTCAGCTTTCTGATTGGTCTCCTTGGTCAGAATGTTCTCAAACATGTGGCC 3720 

iiiiiiillliiilllllillilliiiiiiiiiiiiiliilliiiillliiiiiiiMii 

CTGTGAACTGTCAGCTTTCTGATTGGTCTCCTTGGTCAGAATGTTCTCAAACATGT^ 3937 



llllllllllllltllDliilllllllllliilMlllililMllllllllllillM 



3780 



3997 



CATGCCCTTCCCTGATGGACCAGTCCAAACCCTGCCCAGTGAAGCCTTGTTATCGGTGGC 384 0 

illllllllllllllillllllilillllllllllllllllllllllllllilillllll 
CATGCCCTTCCCTGATGGACCAGTCCAAACCCTGCCCAGTGAAGCCTTGTTATCGGTGGC 4057 



illliiiillllilllliiiillllllllllilliillllllillllDiiiililllli 

AATATGGCCAGTCGTCTCCATGCCAAGTGCAGGAGGCCCAGTGTGGAGAAGGGACCAGAA 4117 

CAAGG AACATTTCTTGTGTAGTAAGTGATGGGTCAGCTGATGATTTCAGCAAAGTGGTGG 3960 

lillllMilllllllllllllllllllllllllMMIIIIlilMIIIIIIIMMII 

CAAGGAACATTTCTTGTGTAGTAAGTGATGGGTCAGCTGATGATTTCAGCAAAGTGGTGG 4 177 
ATGAGGAATTCTGTGCTGACATTGAACTCATTATAGATGGTAATAAAAATATGGTTCTGG 4020 

ItllllllliilliilUiilllllllllllllilllllllllllinillllillMM 

ATGAGGAATTCTGTGCTGACATTGAACTCATTATAGATGGTAATA7U\AATATGGTTCTGG 4237 
AGGAATCCTGCAGCCAGCCTTGCCCAGGTGACTGTTATTTGAAGGACTGGTCTTCCTGGA 4 080 

llilillllllllllllllllllllllllllllllllllllllllllllllillllllii 

AGGAATCCTGCAGCCWGCCTTGCCXrAGGTGRCTGTTATTTGAAGGACTGGTCTTCCTGGA 4297 

GCCTGTGTCAGCTGACCTGTGTGAATGGTGAGGATCTAGGCTTTGGTGGAATACAGGTCA 4X40 
llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 



llllllllllllllllllllllllllllllllllllllllllllllllllllllllll 



23 
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Query: 


4201 




Sbjct: 


4418 


5 


Query : 


4261 




Sbjct: 


4478 


10 


Query : 


4321 




Sbjct : 


4538 




Query : 


4381 


15 


Sbjct: 


4598 




Query : 


4441 


20 


Sbjct: 


4658 




Query : 


4501 




Sbjct: 


4718 


25 


Query: 


4561 




Sbjct: 


4778 


30 


Query : 


4621 




Sb j Ct : 


4836 




Query : 


4681 


35 


Sbjct : 


4898 




Query : 


4741 


40 


Sbjct: 


4956 




Query: 


4801 




Sbjct: 


5018 



IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIII 

TAGAAACAAAATCATGTTATGATGGACAGTGCTATGAATATAAATGGATGGCCAGTGCTT 447 7 
GGAAGGGCTCTTCCCGAACAGTGTGGTGTCAAAGGTCAGATGGTATAAATGTAACAGGGG 4320 

(MllllllllilllllllillliMillllllKIIIIIIIIIMIllllllillllll 

GGAAGGGCTCTTCCCXSAACAGTGTGGTGTCAAAGGTCAGATGGTATAAATGTAACAGGGG 4 537 
GCraCTTGGTGATGAGCCAGCCTGATGCCGACAGGTCTTGTAACCCACCGTGTAGTCAAC 4380 

lllllllllililltlllllllllllllllililllllllllllllllllllllllllil 

GCTGCTTGGTGATGAGCCAGCCTGATGCCGACAGGTCTTGTAACCCACCGTGTAGTCAAC 4597 

CCCACTCGTACTGTAGCGAGACAAAAACATGCCATTGTGAAGAAGGGTACACTGAAGTCA 444 0 

lllllilllllllilllllllliilHIMIIIIlllllKlllliliililllllllll 

CCCACTCGTACTGTAGCGAGACAAAAACATGCCATTGTGAAGAAGGGTACACTGAAGTCA 4657 
TGTCTTCTAACAGCACCCTTGAGCAATGCACACTTATCCCCGTGGTGGTATTACCCACCA 4 500 

IIIIIIMIIillllllllllllllllllllllllllllllllllllllillllllMII 

TGTCTTCTAACAGCACCCTTGAGCAATGCACACTTATCCCCGTGGTGGTATTACCCACCA 4 717 

TGGAGGACAAAAGAGG AGATGTGAAAACCAGTCGGGCTGTACATCCAACCCAACCCTCCA 4 560 

illllllllllliiMllilllilllillillllllllllllllllllllllllllllll 
TGGAGGACAAAAGAGGAGATGTGAAAACCAGTCGGGCTGTACATCCAACCCAACCCTCCA 4777 

GTAACCCAGCAGGACGGGGAAGGACCTGGTTTCTACAGCCATTTGGGCCAGATGGG AGAC 4620 

iiiMiiitiiiiiiiniiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii 

GTAACCCAGCAGGACGGGGAAGGACCnXKSTTTCTACAGCCATTTGGGCCAGATGGGAGAC 4837 

TAAAGACCTCGGTTTACGGTGTAGCAGCTGGGGCATTTGTGTTACTCATCTTTATTGTCT 4680 

llllllllllllillillillllllilllllllllllllllMMIIIilllllUilll 

TAAAGACCTGGGTTTACGGTGTAGCAGCTGGGGCATTTGTGTTACTCATCTTTATTGTCT 4897 
CCATGATTTATCTAGCTTGCAAAAAGCCAAAGAAACCCCAAAGAAGGCAAAACAACCGAC 474 0 

llllllllillllllllllMllllllllllllillllllllllllilllllllllllll 

CCATGATTTATCTAGCTTGCAAAAAGCCAAAGAAACCCCAAAGAAGGCAAAACAACCGAC 4 957 

TGAAACCTTTAACCTTAGCCTATGATGGAGATGCCGACATGTAACATATAACTTTTCCTG 4800 

IIMIIIIIIIIIilllllllillliillillllllllllllllilllliltllllllll 
TGAAACCTTTAACCTTAGCCTATGATGGAGATGCOSACATGTAACATATAACTTTTCCTG 5017 

GCAACAACCA 4810 

llliiiilii 



45 



50 



Table 3. BLASTN VERSUS GENBANK COMPOSITE 

>gb : GENBANK - ID : AB0231 77 |acc:AB023177.1 Homo sapiens mRNA for KIAA0960 protein, 
partial cds - Homo sapiens, 5032 bp. {SEQ ID NO: 59) 
Length « 5032 

Plus Strand HSPs: 



55 



60 



65 



70 



75 



Score = 19495 (2925.0 bits). Expect = 0.0, P « 0.0 
Identities = 3899/3899 (100%), Positives =» 3899/3899 (100%). 



Strand = Plus / Plus 



Query : 


912 


Sbjct: 


1 


Query : 


972 


Sbjct: 


61 


Query: 


1032 


Sb j ct : 


121 


Query : 


1092 


Sbjct: 


181 


Query: 


1152 


Sb j ct : 


241 



GAGTGGAGCCCCTGCTCAAAAACATGCCATGACATGGTGTCCCCTGCAGGCACTCGTGTA 

lllillllllllllllllllllllllllllllllllllllllllllllllllllllllll 
GAGTGGAGCCCCTGCTCAAAAACATGCCATGACATGGTGTCCCCTGCAGGCACTCGTGTA 



971 



60 



AGGACACGAACCATCAGGCAGTTTCCCATTGGCAGTGAAAAGGAGTGTCCAGAATTTGAA 1031 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
AGGACACGAACCATCAGGCAGTTTCCCATTGGCAGTGAAAAGGAGTGTCCAGAATTTGAA 120 

GAAAAAGAACCCTGTTTGTCTCAAGGAGATGGAGTTGTCCCCTGTGCCACGTATGGCTGG 1091 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
GAAAAAG7VACCCTGTTTGTCTCAAGGAGATGGAGTTGTCCCCTGTGCCACGTATGGCTGG 180 

AGAACTACAGAGTGGACTGAGTGCCGTGTGGACCCTTTGCTCAGTCAGCAGGACAAGAGG 1151 

llllllllilllllllllllllllllllllllllllllllllllllllllllllllllll 
AGAACTACAGAGTGGACTGAGTGCCGTGTGGACCCTTTGCTCAGTCAGCAGGACAAGAGG 240 

CGCGGCAACCAGACGGCCCTCTGTGGAGGGGGCATCCAGACCCGAGAGGTGTACTGCGTG 1211 

illlMIIIIIIIIIIIIIIIIIIIIMIIIMIIIIIIIIIIIIIIMIIMIIIIIM 

CGCGGCAACCAGACGGCCCTCTGTGGAGGGGGCATCCAGACCCGAGAGGTGTACTGCGTG 300 

24 
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IIIIIIIIIIMIIIIIIIIIIIIIIIMIIIIIIIIIIIIIillMIIIIIIIIIIIII 

CAGGCCAACGAAAACCTCCTCTCACAATTAAGTACCCACAAGAACAAAGAAGCCTCAAAG 360 
CC7UVTGGACTTAAAATTATGCACTGGACCTATCCCTAATACTACACAGCTGTGCCACATT 1331 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CCAATGGACTTAAAATTATGCACTGGACCTATCCCTAATACTACACAGCTGTGCCACATT 420 
CCTTGTCCAACTGAATGTGAAGTTTCACCTTGGTCAGCTTGGGGACCTTGTACTTATGAA 1391 

10 ~ ' iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii Mill mill 

CCTTGTCCAACTGAATGTGAAGTTTCACCTTGGTCAGCTTGGGGACCTTGTACTTATGAA 480 

AACTGTAATGATCAGCAAGGGAAAAAAGGCTTCAAACTGAGGAAGCGGCGCATTACCAAT 1451 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
15 SbjCC: 481 AACTGTAATGATCAGCAAGGGAAAAAAGGCTTCAAACTGAGGAAGCGGCGCATTACCAAT 540 



20 



25 



35 



40 



60 



Query : 


1212 


Sbj ct : 


301 


Query: 


1272 


Sbjct : 


361 


Query: 


1332 


Sbj ct : 


421 


Query: 


1392 


Sbjct: 


481 


Query : 


1452 


Sbjct: 


541 


Query: 


1512 


Sbjct : 


601 


Query : 


1572 


Sbjct : 


661 


Query: 


1632 


Sbjct: 


721 


Query : 


1692 


Sbj ct : 


781 


Query : 


1752 


Sbjct : 


841 


Query: 


1812 


Sbjct: 


901 


Query: 


1872 


Sbjct : 


961 


Query: 


1932 


Sbjct: 


1021 


Query: 


1992 


Sbjct: 


1081 


Query : 


2052 


Sbj ct : 


1141 


Oupirv • 

\^uc^ y - 


2112 


Sbjct : 


1201 


Query: 


2172 


Sbjct: 


1261 


Query: 


2232 



GAGCCCACTGGAGGCTCTGGGGTAACCGGAAACTGCCCTCACTTACTGGAAGCCATTCCC 1511 

lllllllllllllllllllllllllllllllllllillllllllllllllllllllllll 
GAGCCCACTGGAGGCTCTGGGGTAACCGGAAACTGCCCTCACTTACTGGAAGCCATTCCC 600 

TGTGAAGAGCCTGCCTGTTATGACTGGAAAGCGGTGAGACTGGGAGACTGCGAGCCAGAT 1571 

IIIIIIIIIIMIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIMIIIIIIIMIII 

TGTGAAGAGCCTGCCTGTTATGACTGG/^AAGCGGTGAGACTGGGAGACTGCGAGCCAGAT 660 



iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiM 

ZVCGGAAAGGAGTGTGGTCCAGGCACGCAAGTTCAAGAGGTTGTGTGCATCAACAGTGAT 

3AGAAGAAGTTGACAGACAGCTGTGCAGAGATGCCATCTTCCCCATCCCTGTGGCCTGT 

30 ^ ' I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

GGAGAAGAAGTTGACAGACAGCTGTGCAGAGATGCCATCTTCCCCATCCCTGTGGCCTGT 

GATGCCCCATGCCCGAAAGACTGTGTGCTCAGCACATGGTCTACGTGGTCCTCCTGCTCA 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIMIMIMIIII 

GATGCCCCATGCCCGAAAGACTGTGTGCTCAGCACATGGTCTACGTGGTCCTCCTGCTCA 
CACACCTGCTCAGGGAAAACGACAGAAGGGAAACAGATACGAGCACGATCCATTCTGGCC 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIII 



llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 



45 Query: 1872 AGCTGTAATGAGCATCCTTGCACAGTGTACCACTGGCAAACTGGTCCCTGGGGCCAGTGC 1931 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
AGCTGTAATGAGCATCCTTGCACAGTGTACCACTGGCAAACTGGTCCCTGGGGCCAGTGC 1020 

ATTGAGGACACCTCAGTATCGTCCTTCAACACAACTACGACTTGGAATGGGGAGGCCTCC 1991 

50 ~ I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

ATTGAGGACACCTCAGTATCGTCCTTCAACACAACTACGACTTGGAATGGGGAGGCCTCC 1080 

TGCTCTGTCGGCATGCAGACAAGAAAAGTCATCTGTGTGCGAGTCAATGTGGGCCAAGTG 2 051 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
55 Sbjct: 1081 TGCTCTGTCGGCATGCAGACAAGAAAAGTCATCTGTGTGCGAGTCAATGTGGGCCAAGTG 1140 

GGACCCAAAAAATGTCCTGAAAGCCTTCGACCTGAAACTGTAAGGCCTTGTCTGCTTCCT 2111 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
GGACCCAAAAAATGTCCTGAAAGCCTTCGACCTG7VAACTGTAAGGCCTT6TCTGCTTCCT 1200 



TGTAAGAAGGACTGTATTGTGACCCCATATAGTGACTGGACATCATGCCCCTCTTCGTGT 2171 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMI 

Sbjct: 1201 TGTAAGAAGGACTGTATTGTGACCCCATATAGTGACTGGACATCATGCCCCTCTTCGTGT 1260 



65 Query: 2172 AAAGAAGGGGACTCCAGTATCAGGAAGCAGTCTAGGCATCGGGTCATCATTCAGCTGCCA 2 231 

IIIIIIIINIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

AAAGAAGGGGACTCCAGTATCAGGAAGCAGTCTAGGCATCGGGTCATCATTCAGCTGCCA 1320 

GCCAACGGGGGCCGAGACTGCACAGATCCCCTCTATGAAGAGAAGGCCTGTGAGGCACCT 2 291 

70 ^ ' I I I I I I I I 1 1 1 1 1 1 1 1 1 I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 I I I I I 1 1 1 1 1 1 1 1 I I I M I I M 
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Sbjct: 


1441 


Query: 
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1501 
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2472 
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1561 
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2532 
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1621 
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2592 
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1681 
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2652 
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1741 
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2712 


Sbjct: 


1801 
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2772 


Sbjct : 


1861 
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2832 
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1921 
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2892 


Sbjct: 


1981 


Query: 


2952 


Sbjct: 


2041 


Query: 


3012 


Sbjct : 


2101 


Query: 


3072 


Sbjct: 


2161 


Query: 


3132 


Sbjct : 


2221 


Query: 


3192 


Sbjct: 


2281 


Query: 


3252 


Sb j Ct : 


2341 



1321 GCCAACGGGGGCCGAGACTGCACAGATCCCCTCTATGAAGAGAAGGCCTGTGAGGCACCT 1380 

CAAGCGTGCCAAAGCTACAGGTGGAAGACTCACAAATGGCGCAGATGCCAATTAGTCCC^ 2351 

lllllllllllllllllllllllllllllllllllllillllllllllllllllllllll 
CAAGCGTGCCAAAGCTACAGGTGGAAGACTCACAAATGGCGCAGATGCCAATTAGTCCCT 1440 

t 

TGGAGCGTGCAACAAGACAGCCCTGGAGCACAGGAAGGCTGTGGGCCTGGGCGACAGGCA 2411 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIII 

TGGAGCGTGCAACAAGACAGCCCTGGAGCACAGGAAGGCTGTGGGCCTGGGCGACAGGCA 1500 
AGAGCCATTACTTGTCGCAAGCAAGATGGAGGACAGGCTGGAATCCATGAGTGCCTAC7VG 2 4 71 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIM 

AGAGCCATTACTTGTCGCAAGCAAGATGGAGGACAGGCTGGAATCCATGAGTGCCTACAG 1560 
TATGCAGGCCCTGTGCCAGCCCTTACCCAGGCCTGCCAGATCCCCTGCCAGGATGACTGT 2531 

MIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIII 

TATGCAGGCCCTGTGCCAGCCCTTACCCAGGCCTGCCAGATCCCCTGCCAGGATGACTGT 1620 
CAATTGACCAGCTGGTCCAAGTTTTCTTCATGCAATGGAGACTGTGGTGCAGTTAGGACC 2 591 

IIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIMIMMIIMI 

CAATTGACCAGCTGGTCCAAGTTTTCTTCATGCAATGGAGACTGTGGTGCAGTTAGGACC 1680 
AGAAAGCGCACTCTTGTTGGAAAAAGTAAAAAGAAGGAAAAATGTAAAAATTCCCATTTG 2 651 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIII 

AGAAAGCGCACTCTTGTTGGAAAAAGTAAAAAGAAGGAAAAATGTAAAAATTCCCATTTG 174 0 
TATCCCCTGATTGAGACTCAGTATTGTCCTTGTGACAAATATAATGCACAACCTGTGGGG 2711 

IIIIIIIIIIIIIMIIIIIIIIIIIMIMIIIIIIIIIIIIIIIIIMIIIIIIMII 

TATCCCCTGATTGAGACTCAGTATTGTCCTTGTGACAAATATAATGCACAACCTGTGGGG 1800 
AACTGGTCAGACTGTATTTTACCAGAGGGAAAAGTGGAAGTGTTGCTGGGAATGAAAGTA 2 771 

IIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

AACTGGTCAGACTGTATTTTACCAGAGGGAAAAGTGGAAGTGTTGCTGGGAATGAAAGTA 1860 
CAAGGAGACATCAAGGAATGCGGACAAGGATATCGTTACCAAGCAATGGCATGCTACGAT 2831 

lllllllllllinillMIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIilMII 

CAAGGAGACATCAAGGAATGCGGACAAGGATATCGTTACCAAGCAATGGCATGCTACGAT 1920 
CAAAATGGCAGGCTTGTGGAAACATCTAGATGTAACAGCCATGGTTACATTGAGGAGGCC 2891 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIII 

CAAAATGGCAGGCTTGTGGAAACATCTAGATGTAACAGCCATGGTTACATTGAGGAGGCC 1980 
TGCATCATCCCCTGCCCCTCAGACTGCAAGCTCAGTGAGTGGTCCAACTGGTCGCGCTGC 2951 

IIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMM 

TGCATCATCCCCTGCCCCTCAGACTGCAAGCTCAGTGAGTGGTCCAACTGGTCGCGCTGC 2040 
AGCAAGTCCTGTGGGAGTGGTGTGAAGGTTCGTTCTAAATGGCTGCGTGAAAAACCATAT 3 011 

llllllilllllllllllllllllllllllllllllllllllllllllllllllllllll 

AGCAAGTCCTGTGGGAGTGGTGTGAAGGTTC6TTCTAAATGGCTGCGTGAAAAACCATAT 2100 

AATGGAGGAAGGCCTTGCCCCAAACTGGACCATGTCAACCAGGCACAGGTGTATGAGGTT 3 071 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
AATGGAGGAAGGCCTTGCCCCAAACTGGACCATGTCAACCAGGCACAGGTGTATGAGGTT 2160 

GTCCCATGCCACAGTGACTGCAACCAGTACCTATGGGTCACAGAGCCCTGGAGCATCTGC 3131 

MIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

GTCCCATGCCACAGTGACTGCAACCAGTACCTATGGGTCACAGAGCCCTGGAGCATCTGC 2220 

AAGGTGACCTTTGTGAATATGCGGGAGAACTGTGGAGAGGGCGTGCAAACCCGAAAAGTG 3191 

lllllllliililllliilillllilillllllllliliilllliiMllllllllllll 

AAGGTGACCTTTGTGAATATGCGGGAGAACTGTGGAGAGGGCGTGCAAACCCGAAAAGTG 2280 

AGATGCATGCAGAATACAGCAGATGGCCCTTCTGAACATGTAGAGGATTACCTCTGTGAC 3251 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
AGATGCATGCAGAATACAGCAGATGGCCCTTCTGAACATGTAGAGGATTACCTCTGTGAC 2340 

CCAGAAGAGATGCCCCTGGGCTCTAGAGTGTGCAAATTACCATGCCCTGAGGACTGTGTG 3 311 

MIIIIIIINMIIilllllllllllllllllllllllllllllllllllllMIIIII 

CCAGAAGAGATGCCCCTGGGCTCTAGAGTGTGCAAATTACCATGCCCTGAGGACTGTGTG 2400 
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3301 
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4272 


Sbjct: 


3361 


Query : 


4332 



ATATCTGAATGGGGTCCATGGACCCAATGTGTTTTGCCTTGCAATCAAAGCAGTTTCCGG 3371 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIII 

ATATCTGAATGGGGTCCATGGACCCAATGTGTTTTGCCTTGCAATCAAAGCAGTTTCCGG 2460 
CAAAGGTCAGCTGATCCCATCAGACAACCAGCTGATGAAGGAAGATCTTGCCCTAATGCT 3431 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CAAAGGTCAGCTGATCCCATCAGACAACCAGCTGATGAAGGAAGATCTTGCCCTAATGCT 2520 
GTTGAGAAAGAACCCTGTAACCTGAACAAAAACTGCTACCACTATGATTATAATGTAACA 3491 

IIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

GTTGAGAAAGAACCCTGTAACCTGAACAAAAACTGCTACCACTATGATTATAATGTAACA 2580 
GACTGGAGTACATGTCAGCTGAGTGAGAAGGCAGTTTGTGGAAATGGAATAAAAACAAGG 3551 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

GACTGGAGTACATGTCAGCTGAGTGAGAAGGCAGTTTGTGGAAATGGAATAAAAACAAGG 264 0 
ATGTTGGATTGTGTTCGAAGTGATGGCAAGTCAGTTGACCTGAAATATTGTGAAGCGCTT 3 611 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIMIIIII 

ATGTTGGATTGTGTTCGAAGTGATGGCAAGTCAGTTGACCTGAAATATTGTGAAGCGCTT 2 700 
GGCTTGGAGAAGAACTGGCAGATGAACACGTCCTGCATGGTGGAATGCCCTGTGAACTGT 3671 

IIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIMIIIIIIIMIIIIIIIIIIIIIII 

GGCTTGGAGAAGAACTGGCAGATGAACACGTCCTGCATGGTGGAATGCCCTGTGAACTGT 276 0 
CAGCTTTCTGATTGGTCTCCTTGGTCAGAATGTTCTCAAACATGTGGCCTCACAGGAAAA 3 731 

IIIIIIIIIIMIIIIMIIIMIIIIIIIIIIIMIIIIMIIIIIIIIIIIIIIIIII 

CAGCTTTCTGATTGGTCTCCTTGGTCAGAATGTTCTCAAACATGTGGCCTCACAGGAAAA 2 820 
ATGATCCGAAGACGAACAGTGACCCAGCCCTTTCAAGGTGATGGAAGACCATGCCCTTCC 3791 

MIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIII 

ATGATCCGAAGACGAACAGTGACCCAGCCCTTTCAAGGTGATGGAAGACCATGCCCTTCC 2 880 
CTGATGGACCAGTCCAAACCCTGCCCAGTGAAGCCTTGTTATCGGTGGCAATATGGCCAG 3 851 

MIMIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIII 

CTGATGGACCAGTCCAAACCCTGCCCAGTGAAGCCTTGTTATCGGTGGCAATATGGCCAG 2940 

TGGTCTCCATGCCAAGTGCAGGAGGCCCAGTGTGGAGAAGGGACCAGAACAAGGAACATT 3911 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
TGGTCTCCATGCCAAGTGCAGGAGGCCCAGTGTGGAGAAGGGACCAGAACAAGGAACATT 3000 

TCTTGTGTAGTAAGTGATGGGTCAGCTGATGATTTCAGCAAAGTGGTGGATGAGGAATTC 3 971 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
TCTTGTGTAGTAAGTGATGGGTCAGCTGATGATTTCAGCAAAGTGGTGGATGAGGAATTC 3060 

TGTGCTGACATTGAACTCATTATAGATGGTAATAAAAATATGGTTCTGGAGGAATCCTGC 4031 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
TGTGCTGACATTGAACTCATTATAGATGGTAATAAAAATATGGTTCTGGAGGAATCCTGC 3120 

AGCCAGCCTTGCCCAGGTGACTGTTATTTGAAGGACTGGTCTTCCTGGAGCCTGTGTCAG 4 091 

lllllllllllllllllllllllllllllllllllllllllllllllllllllllillll 
AGCCAGCCTTGCCCAGGTGACTGTTATTTGAAGGACTGGTCTTCCTGGAGCCTGTGTCAG 3180 

CTGACCTGTGTGAATGGTGAGGATCTAGGCTTTGGTGGAATACAGGTCAGATCCAGACCG 4151 

IIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CTGACCTGTGTGAATGGTGAGGATCTAGGCTTTGGTGGAATACAGGTCAGATCCAGACCG 3240 

GTGATTATACAAGAACTAGAGAATCAGCATCTGTGCCCAGAGCAGATGTTAGAAACAAAA 4211 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIMIIIIIIIIIIII 

GTGATTATACAAGAACTAGAGAATCAGCATCTGTGCCCAGAGCAGATGTTAGAAACAAAA 3300 

TCATGTTATGATGGACAGTGCTATGAATATAAATGGATGGCCAGTGCTTGGAAGGGCTCT 4271 

IIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

TCATGTTATGATGGACAGTGCTATGAATATAAATGGATGGCCAGTGCTTGGAAGGGCTCT 3360 

TCCCGAACAGTGTGGTGTCAAAGGTCAGATGGTATAAATGTAACAGGGGGCTGCTTGGTG 4331 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIII 

TCCCGAACAGTGTGGTGTCAAAGGTCAGATGGTATAAATGTAACAGGGGGCTGCTTGGTG 3420 

ATGAGCCAGCCTGATGCCGACT^GGTCTTGTAACCCACCGTGTAGTCAACCCCACTCGTAC 4391 

MMIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIMIIIIIIIIMIIIIIMIIMI 
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Sbjct: 3421 ATGAGCCAGCCTGATGCCGACAGGTCTTGTAACCCACCGTGTAGTCAACCCCACTCGTAC 3480 
TGTAGCGAGACAAAAACATGCCATTGTGAAGAAGGGTACACTGAAGTCATGTCTTCTAAC 4451 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIMIIIIIIIIIIIIIIIIIIIIIIIIII 

TGTAGCGAGACAAAAACATGCCATTGTGAAGAAGGGTACACTGAAGTCATGTCTTCTAAC 3540 

AGCACCCTTGAGCAATGCACACTTATCCCCGTGGTGGTATTACCCACCATGGAGGACAAA 4511 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
AGC7VCCCTTGAGCAATGCACACTTATCCCCGTGGTGGTATTACCCACCATGGAGGACAAA 3600 

AGAGGAGAT6TGAAAACCAGTCGGGCTGTACATCCAACCCAACCCTCCAGTAACCCAGCA 4571 

IIIIIIIIMIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMM 

AGAGGAGATGTGAAAACCAGTCGGGCTGTACATCCAACCCAACCCTCCAGTAACCCAGCA 3 660 
GGACGGGGAAGGACCTGGTTTCTACAGCCATTTGGGCCAGATGGGAGACTAAAGACCTGG 4631 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIMII 

GGACGGGGAAGGACCTGGTTTCTACAGCCATTTGGGCCAGATGGGAGACTAAAGACCTGG 3720 
GTTTACGGTGTAGCAGCTGGGGCATTTGTGTTACTCATCTTTATTGTCTCCATGATTTAT 4 691 

20 ^ ' llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

rTACGGTGTAGCAGCTGGGGCATTTGTGTTACTCATCTTTATTGTCTCCATGATTTAT 3780 
^GCTTGCAAAAAGCCAAAGAAACCCCAAAGAAGGCAAAACAACCGACTGAAACCTTTA 4 751 

IIIIIIIIMIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIM 

^GCTTGCAAAAAGCCAAAGAAACCCCAAAGAAGGCAAAACAACCGACTGAAACCTTTA 3840 
CTTAGCCTATGATGGAGATGCCGACATGTAACATATAACTTTTCCTGGCAACAACCA 4810 

IIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIII 
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SECPll 

A SECPl 1 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:42 and encoded polypeptide sequence (SEQ ID NO:43) of clone 
CG508 17-04 directed toward novel peptidase (HPEP-8)-like proteins and nucleic acids encoding 

35 them. FIG. 16 illustrates the nucleic acid sequence and amino acid sequences. This clone 

includes a nucleotide sequence (SEQ ID NO:42) of 1447 bp. The nucleotide sequence includes 
an open reading frame (ORE) beginning with an ATG initiation codon encoding a polypeptide of 
224 amino acid residues (SEQ ID NO:43). The start codon is located at nucleotides 520-522 and 
the stop codon is located at nucleotides 1 192-1 194. Putative untranslated regions, if any, are 

40 found upstream from the initiation codon and downstream from the termination codon. The 
protein encoded by clone CG508 17-04 is predicted by the PSORT program to localize in the 
cytoplasm with a certainty of 0.4500. The program PSORT and program SignalP predict that the 
protein appears to have no amino-terminal signal sequence. 

Novel peptidase (HPEP-8)-like proteins are related to conditions of failure to thrive, 

45 nutritional edema, and hypoproteinemia with normal sweat electrolytes as reported by Townes et 
al (J. Pediat. 71 : 220-224, 1967) for 2 affected male infants. This condition could be treated by a 
protein hydrolysate diet. Morris and Fisher (Am. J. Dis. Child. 1 14: 203-208, 1967) reported an 
affected female who also had imperforate anus, a result of a defect in the synthesis of the 
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enterokinase which activates proteolytic en2ynies produced by the pancreas. Oral pancreatin 
represents a therapeutically successful form of enzyme replacement. Trypsin, like elastase is a 
member of the pancreatic family of serine proteases. MacDonald et al. (J. Biol. Chem. 257: 
9724-9732, 1982) reported nucleotide sequences of cDNAs representing 2 pancreatic rat 
5 trypsinogens. The trypsin gene is on mouse chromosome 6 (Honey et al., Somat. Cell Molec. 
Genet. 10: 369-376, 1984). Carboxypeptidase A and trypsin are a syntenic pair conserved in 
mouse and man. Emi et al. (Gene 41 : 305-3 10, 1986) isolated cDNA clones for 2 major human 
trypsinogen iso2ymes from a pancreatic cDNA library. The deduced amino acid sequences had 
89% homology and the same number of amino acids (247), including a 1 5-amino acid signal 

10 peptide and an 8-amino acid activation peptide. Southern blot analysis of human genomic DNA 
with the cloned cDNA as a probe showed that the human trypsinogen genes constitute a family 
of more than 10. The gene encoding trypsin- 1 (TRYl) is also referred to as serine protease- 1 
(PRSSl). Rowen et al. (Science 272: 1755-1762, 1996) found that there are 8 trypsinogen genes 
embedded in the beta T-cell receptor locus or cluster of genes (TCRB) mapping to 7q35. In the 

15 685-kb DNA segment that they sequenced they found 5 tandemly arrayed 10-kb locus-specific 
repeats (homology units) at the 3-prime end of the locus. These repeats exhibited 90 to 91% 
overall nucleotide similarity, and embedded within each is a trypsinogen gene. Alignment of 
pancreatic trypsinogen cDNAs with the germline sequences showed that these trypsinogen genes 
contain 5 exons that span approximately 3.6 kb. They denoted 8 trypsinogen genes Tl through 

20 T8 from 5-prime to 3-prime. Some of the trypsinogen genes are expressed in nonpancreatic 

tissues where their function is unknown. Rowen et al. (Science 272: 1755-1762, 1996) noted that 
the intercalation of the trypsinogen genes in the TCRB locus is conserved in mouse and chicken, 
suggesting shared functional or regulatory constraints, as has been postulated for genes in the 
major histocompatibility complex (such as class I, II, and III genes) that share similar long-term 

25 organizational relationships. The gene of invention is a novel serine protease containing a tr3^sin 
domain but localized on chromosome 16. 

The sequence of the invention was derived by laboratory cloning of cDNA fragments 
covering the full length and/or part of the DNA sequence of the invention, and/or by in silico 
prediction of the full length and/or part of the DNA sequence of the invention from public human 

30 sequence databases. 

The laboratory cloning was performed using one or more of the methods summarized as: 
SeqCallingTM Technology, where cDNA was derived from various human samples representing 
multiple tissue types, normal and diseased states, physiological states, and developmental states 
from different donors. Samples were obtained as whole tissue, cell lines, primary cells or tissue 
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cultured primary cells and cell lines. Cells and cell lines may have been treated with biological or 
chemical agents that regulate gene expression for example, growth factors, chemokines, steroids. 
The cDNA thus derived was then sequenced using CuraGen*s proprietary SeqCalling technology. 
Sequence traces were evaluated manually and edited for corrections if appropriate. cDNA 

5 sequences from all samples were assembled with themselves and with public ESTs using 
bioinformatics programs to generate CuraGen's human SeqCalling database of SeqCalling 
assemblies. Each assembly contains one or more overlapping cDNA sequences derived from one 
or more human samples. Fragments and ESTs were included as components for an assembly 
when the extent of identity with another component of the assembly was at least 95% over 50 bp. 

10 Each assembly can represent a gene and/or its variants such as splice forms and/or single 
nucleotide polymorphisms (SNPs) and their combinations. 

Exon Linking, where the cDNA coding for the sequence was cloned by polymerase chain 
reaction (PGR) using the following primers: 5'CTGCTGACCAACACAGCTGCTCAC3' and 
5'GACAGGGGCAGTAATGCCATTTGC3' on the following pools of human cDNAs: Pool 1 - 
15 Adrenal gland, bone marrow, brain - amygdala, brain - cerebellum, brain - hippocampus, brain - 
substantia nigra, brain - thalamus, brain -whole, fetal brain, fetal kidney, fetal liver, fetal lung, 
heart, kidney, lymphoma - Raji, mammary gland, pancreas, pituitary gland, placenta, prostate, 
salivary gland, skeletal muscle, small intestine, spinal cord, spleen, stomach, testis, thyroid, 
trachea, uterus. 

20 Primers were designed based on in silico predictions for the full length or part (one or 

more exons) of the DNA/protein sequence of the invention or by translated homology of the 
predicted exons to closely related human sequences or to sequences from other species. Usually 
multiple clones were sequenced to derive the sequence which was then assembled similar to the 
SeqCalling process. In addition, sequence traces were evaluated manually and edited for 

25 corrections if appropriate. 

Variant sequences are also included in this application. A variant sequence can include a 
single nucleotide polymorphism (SNP). A SNP can, in some instances, be referred to as a 
"cSNP" to denote that the nucleotide sequence containing the SNP originates as a cDNA. A SNP 
can arise in several ways. For example, a SNP may be due to a substitution of one nucleotide for 
30 another at the polymorphic site. Such a substitution can be either a transition or a transversion. A 
SNP can also arise from a deletion of a nucleotide or an insertion of a nucleotide, relative to a 
reference allele. In this case, the polymorphic site is a site at which one allele bears a gap with 
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respect to a particular nucleotide in another allele. SNPs occurring within genes may result in an 
alteration of the amino acid encoded by the gene at the position of the SNP. Intragenic SNPs may 
also be silent, however, in the case that a codon including a SNP encodes the same amino acid as 
a result of the redundancy of the genetic code. SNPs occurring outside the region of a gene, or in 
5 an intron within a gene, do not result in changes in any amino acid sequence of a protein but may 
result in altered regulation of the expression pattern for example, alteration in temporal 
expression, physiological response regulation, cell type expression regulation, intensity of 
expression, stability of transcribed message. 

The DNA sequence and protein sequence for a novel Peptidase (HPEP-8)'like gene or 
10 one of its splice forms was obtained solely by exon linking and is reported here as CuraGen Acc. 
No. CG508 17-04. 

Real-time expression analysis was performed on SECPl 1 (clone CG508 17-04). The 
results demonstrate that RNA homologous to this clone is present in multiple tissue and cell 
types. 

15 Accordingly, SECPl 1 nucleic acids according to the invention can be used to identify 

one or more of these tissue types. The presence of RNA sequences homologous to a SECPl 1 
nucleic acid in a sample indicates that the sample contains one or more of the above-tissue types. 

In a search of sequence databases, it was found, for example, that the nucleic acid 
sequence of this invention has 1086 of 1087 bases (99%) identical to a human peptidase, 

20 HPEP-8 mRNA (patn : A37664. The full amino acid sequence of the protein of the invention was 
found to have 254 of 255 amino acid residues (99%) identical to, and 254 of 257 amino acid 
residues (99%) similar to, the 571 amino acid residue ptnr: patp:Y4i704 Human pro351 
protein sequence from Homo sapiens. 

The presence of identifiable domains in the protein disclosed herein was determined by 

25 searches using algorithms such as PROSITE, Blocks, Pfam, ProDomain, Prints and then 

determining the Interpro number by crossing the domain match (or numbers) using the Interpro 
website. The results indicate that this protein contains the following protein domains (as defined 
by Interpro) at the indicated positions: domain name trypsin at amino acid positions 1 5 to 179. 
This indicates that the sequence of the invention has properties similar to those of other proteins 

30 known to contain this/these domain(s) and similar to the properties of these domains. 
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Chromosomal information: 

The Peptidase (HPEP-8) disclosed in this invention maps to chromosome 16. This 
information was assigned using OMIM, the electronic northern bioinformatic tool implemented 
by CuraGcn Corporation, public ESTs, public literature references and/or genomic clone 
5 homologies. This was executed to derive the chromosomal mapping of the SeqCalling 

assemblies, Genomic clones, literature references and/or EST sequences that were included in 
the invention. 

Tissue expression 

The Peptidase (HPEP-8) disclosed in this invention is expressed in at least the following 
10 tissues: Adrenal gland, bone marrow, brain - amygdala, brain - cerebellum, brain - hippocampus, 
brain - substantia nigra, brain - thalamus, brain -whole, fetal brain, fetal kidney, fetal liver, fetal 
lung, heart, kidney, lymphoma - Raji, mammary gland, pancreas, pituitary gland, placenta, 
prostate, salivary gland, skeletal muscle, small intestine, spinal cord, spleen, stomach, testis, 
thyroid, trachea, uterus. This information was derived by determining the tissue sources of the 
15 sequences that were included in the invention including but not limited to SeqCalling sources. 
Public EST sources, and/or RACE sources. 

Cellular Localization and Sorting 

The SignalP, Psort and/or Hydropathy profile for the Peptidase (HPEP-8)-like protein are 
shown in Table 7. The results predict that this sequence has no signal peptide and is likely to be 
20 localized in the cytoplasm with a certainty of 0.4500 predicted by PSORT. 

The proteins of the invention encoded by clone CG508 17-04 include the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
encompass both a precursor and any active forms of the clone CG508 17-04 protein. 

25 Functional Variants and Homologs 

The novel nucleic acid of the invention encoding a Peptidase (HPEP-8)-like protein 
includes the nucleic acid whose sequence is provided in Figure 16, or a fragment thereof The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base while still encoding a protein that maintains its Peptidase (HPEP- 
30 8)-like activities and physiological functions, or a fragment of such a nucleic acid. The invention 
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further includes nucleic acids whose sequences are complementary to those just described, 
including nucleic acid fragments that are complementary to any of the nucleic acids just 
described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
5 include, by way of non-limiting example, modified bases, and nucleic acids whose sugar 

phosphate backbones are modified or derivatized. These modifications are carried out at least in 
part to enhance the chemical stability of the modified nucleic acid, such that they may be used, 
for example, as antisense binding nucleic acids in therapeutic applications in a subject. In the 
mutant or variant nucleic acids, and their complements, up to 1% of the residues may be so 
10 changed. 

The novel protein of the invention includes the Peptidase (HPEP-8)-like protein whose 
sequence is provided in Figure 16. The invention also includes a mutant or variant protein any of 
whose residues may be changed from the corresponding residue shown in Figure 16 while still 
encoding a protein that maintains its Peptidase (HPEP-8)-like activities and physiological 
1 5 functions, or a functional fragment thereof. In the mutant or variant protein, up to about 1% of 
the bases may be so changed. 

Antibodies 

The invention further encompasses antibodies and antibody fragments, such as Fab, 
(Fab)2 or single chain FV constructs, that bind immunospecifically to any of the proteins of the 
20 invention. Also encompassed within the invention are peptides and polypeptides comprising 
sequences having high binding affinity for any of the proteins of the invention, including such 
peptides and polypeptides that are fused to any carrier partcle (or biologically expressed on the 
surface of a carrier) such as a bacteriophage particle. 

Uses of the Compositions of the Invention 

25 The protein similarity information, expression pattern, and map location for the Peptidase 

(HPEP-8)-like protein and nucleic acid disclosed herein suggest that this Peptidase (HPEP-8) 
may have important structural and/or physiological functions characteristic of the Serine protease 
family. Therefore, the nucleic acids and proteins of the invention are useful in potential 
diagnostic and therapeutic applications and as a research tool. These include serving as a specific 

30 or selective nucleic acid or protein diagnostic and/or prognostic marker, wherein the presence or 
amount of the nucleic acid or the protein are to be assessed, as well as potential therapeutic 
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applications such as the following: (i) a protein therapeutic, (ii) a small molecule drug target, (iii) 
an antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), (iv) a nucleic acid 
useful in gene therapy (gene delivery/gene ablation), and (v) a composition promoting tissue 
regeneration in vitro and in vivo (vi) biological defense weapon. 

5 The nucleic acids and proteins of the invention are useful in potential diagnostic and 

therapeutic applications implicated in various diseases and disorders described below and/or 
other pathologies. For example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from: cell proliferative disorder; arteriosclerosis; psoriasis; 
myelofibrosis; cancer; autoimmune disorder; Crohn's disease; inflammatory disorder; AIDS; 
10 anaemia; allergy; asthma; atherosclerosis; Grave's disease; multiple sclerosis; scleroderma; 
infection; diabetes; metabolic disorder; Addison's disease; cystic fibrosis; glycogen storage 
disease; obesity; nutritional edema, hypoproteinemia and other diseases, disorders and conditions 
of the like. 

These materials are further useful in the generation of antibodies that bind 
15 immunospecifically to the novel substances of the invention for use in therapeutic or diagnostic 
methods. 

Table 4. BLASTN identity search for the nucleic acid of the invention versus GenBank. 

>patn:A37664 Human peptidase. HPEP-8 coding sequence - Homo sapiens. 1661 bp. (SEQ id no: 60) 

20 Length = 1661 

Plus Strand HSPs: 

Score = 5426 (814.1 bits). Expect = 5.1e-240, P = 5.1e-240 
25 Identities 1086/1087 (99%), Positives = 1086/1087 (99%), Strand « Plus / Plus 



30 



35 



45 



Query: 


3 


Sbjct: 


1 


Query : 


63 


Sbjct: 


61 


Query : 


123 


Sbjct: 


121 


Query : 


183 


Sb j Ct : 


181 


Query : 


243 


Sb j ct : 


241 


Query: 


303 



IIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

13GACACCAGTGATGCTCCTGGGACCCTACGCAATCTGCGCCTGCGTCTCATCAGTCGCCC 

CACATGTAACTGTATCTACAACCAGCTGCACCAGCGACACCTGTCCAACCCGGCCCGGCC 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
CACATGTAACTGTATCTACAACCAGCTGCACCAGCGACACCTGTCCAACCCGGCCCGGCC 

TGGGATGCTATGTGGGGGCCCCCAGCCTGGGGTGCAGGGCCCCTGTCAGGTCTGATAGGG 

IIIIIIIIMIIIIIIIIIIMIIIIMIIIIIIIIIIIIIIIIIIIMIIIIIIIIIM 

TGGGATGCTATGTGGGGGCCCCCAGCCTGGGGT6CAGGGCCCCTGTCAGGTCTGATAGGG 

AGAAGAGAAGGAGCAGAAGGGGAGGGGCCTAACCCTGGGCTGGGGGTTGGACTCACAGGA 

40 " ' I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

fVGAAGAGAAGGAGCAGAAGGGGAGGGGCCTAACCCTGGGCTGGGGGTTGGACTCACAGGA 

CTGGGGGAAAGAGCTGCAATCAGAGGGTGTCTGCCATAGCTGGGCTCAGGCATCTGTCCT 

MllllllllllllllllllllllllllllilllllllllllllllllllNIIIIMM 



MIIIIIIIIIIIIIIIIMIIIIillllllllllllllllllllllllllllllllill 
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15 



20 



25 



30 



35 



40 



45 



50 



55 



Sb j Ct : 

Query: 
Sbjct: 
Query: 
Sb j ct : 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sb j ct : 
Query: 
Sb j ct : 
Query : 
Sb j ct : 
Query : 
Sbjct : 
Query : 
Sbjct : 
Query: 
Sb j ct : 

Query: 
Sbjct : 



301 TGGCTTTGTTGCCTGGCTCCAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGA 360 

363 CGGACACTGGGTTCAGGCTGGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGC 422 

MIIMIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIII 

361 CGGACACTGGGTTCAGGCTGGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGC 420 
423 TCCTGTGCTGCTGACCAACACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGG 482 

MIIIMIIIIIIIIIIIIIIIIMIIMIIIIIIIIIIIIIIIMIIIIIIIIIIIIII 

421 TCCTGTGCTGCTGACCAACACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGG 480 

483 GGCAGCTTTCCTGGCCCAGAGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGT 542 

lllllllllllllllllllllllllllllllllllllllllllllllllillllllllll 
481 GGCAGCTTTCCTGGCCCAGAGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGT 540 

543 AGCCTGTGGATCCTTGAGGACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTG 602 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
541 AGCCTGTGGATCCTTGAGGACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTG 6 00 

603 GGAGGCCAGGCTGATGCACCAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGA 662 

IIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIII 

601 GGAGGCCAGGCTGATGCACCAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGA 660 
663 GGCGGTGCT7VACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGT 722 

MIIIIIIIIMIIMIIIIIIIIIIIIIIIIIMIIIIIIIIIMIIIIIIMIIMII 

661 GGCGGTGCTAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGT 720 
723 AGGGCTGGGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTA 782 

IIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIMIIII 

721 AGGGCTGGGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTA 780 
783 CACCCACCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACT 842 

IIMIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIMIIIillllllMMI 

781 CACCCACCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACT 840 



84 3 GGGAGCCAGCCTGCGGCCCCTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGGGA 

IIIIIIIIIIIIIIIIIIIMIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIII 

841 GGGAGCCAGCCTGCGGCCCCTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGGGA 



902 



900 



90 3 GCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGT 962 

IIIIIIIIMIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMI 

901 GCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGT 960 

963 GCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGA 1022 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
961 GCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGA 1020 

1023 TGGCAGCCCTATTCTGCCGGGGATGGTGTGTACCAGTGCTGTGGGTGAGCTGCCCAGCTG 1082 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
1021 TGGCAGCCCTATTCTGCCGGGGATGGTGTGTACCAGTGCTGTGGGTGAGCTGCCCAGCTG 1080 



Query: 1083 TGAGGCC 1089 

mil I 

Sbjct: 1081 TGAGGGC 1087 
Score = 1931 (289.7 bits). Expect = 3.7e-82, P = 3.7e-82 

Identities =» 635/848 (74%), Positives = 635/848 (74%), Strand = Plus / Plus 





Query : 


600 


60 


Sbjct: 


818 




Query: 


657 




Sbjct: 


874 


65 






Query: 


713 




Sbjct : 


934 


70 


Query : 


771 



III I III! Ill II II I Mil II Mill III I MM I 



I II 



I I II 



II III II I I II II II I II Mill I III 



I II 



MM I 



III III I II I III 
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10 



15 



35 



Sb j Ct : 


990 


Query: 


831 


Sbjct : 


1045 


Query: 


887 


Sbj ct : 


1102 


Query : 


945 


Sbjct: 


1160 


Query : 


1002 


Sbjct: 


1218 


Query: 


1061 


Sbjct: 


1272 


Query: 


1117 


Sbjct: 


1332 


Query: 


1177 


Sbjct : 


1391 


Query: 


1237 


Sbjct: 


1451 


Query: 


1297 


Sbj ct : 


1511 


Query: 


1357 


Sbjct: 


1571 


Query: 


1417 


Sbjct: 


1631 



I M II II II lllll I I II Mill I MM II 

:GGCTGCATGCAGC - TCCTGGGGGTGATGGCA - - GCCCTATT - CTGCCGGGGAT 1044 



IIM II I I I I I II MM III I III I III MM 



II III III II III I I lllll II lllll II III I I II 

ACTGGTGCATGA-GGTGAGGGGCACATGGTTCCTGGCCGGGCT-GCACAGCTTCGGAGAT 1159 

- CA - GCTCCCTCCA - GACAGTGCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCT 1001 
I II II I III III I I I II I II MM I I II 



I I MM I II II I III III III MM III I I I II I 



20 Query- 1061 CTGTGGGTG-A-GCTGCCCAGCTGTGAG- -GCCAACCAACCAGCTGCTGACAGGGGACCT 1116 

Ml I II I llllll II I IIMIIIIIIIIIIIIIIIIIIIIIIIIII 

CTGAGCCTGGAAGCTGCCTGGCCAACATAAGCCAACCAACCAGCTGCTGACAGGGGACCT 1331 

GGCCATTCTCAGGAACAAGAGAATGCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTCC 1176 

25 II II II 1 1 II M II II II M II I II II I II I II II II II II M II II M II I II II II I 

GGCCATTCTCAGGA - CAAGAGAATGCAGGCAGGCAJ\ATGGCATTACTGCCCCTGTCCTCC 1390 

CCACCCTGTCATGTGTGATTCCAGGCACCAGGGCAGGCCCAGAAGCCCAGCAGCTGTGGG 1236 

MIIIIIIMMII II MM I II III II III I II MM II II MM MM II I MM II I 

30 Sbjct: 1391 CCACCCTGTCATGTGTGATTCCAGGCACCAGGGCAGGCCCAGAAGCCCAGCAGCTGTGGG 1450 

AAGGAACCTGCCTGGGGCCACAGGTGCCCACTCCCCACCCTGCAGGACAGGGGTGTCTGT 1296 

MMMMMMM IIMIIIIIIIIIIIIIIIIIIIIIIIIIIIMIMIIIIIIIIII 

AAGGAACCTGCCTGGGGCCACAGGTGCCCACTCCCCACCCTGCAGGACAGGGGTGTCTGT 1510 



GGACACTCCCACACCCAACTCTGCTACCAAGCAGGCGTCTCAGCTTTCCTCCTCCTTTAC 1356 

MMMMMMM III IIIIIMIIIIIIIIIIIIIII llllll II lllll MM MM 

GGACACTCCCACACCCAACTCTGCTACCAAGCAGGCGTCTCAGCTTTCCTCCTCCTTTAC 1570 



40 Query: 1357 CCTTTCAGATACAATCACGCCAGCCACGTTGTTTTGAAAATTTCTTTTTTTGGGGGGCAG 1416 

MMMIIMMMMMMIIMMMIMMMIMMMMMMMMIIMMM 

CCTTTCAGATACAATCACGCCAGCCACGTTGTTTTGAAAATTTCTTTTTTTGGGGGGCAG 1630 
CAGTTTTCCTTTTTTTAAACTTAAATA7VATT 1447 

45 ~ ' M 1 1 II I M I M II 1 1 1 M M 1 1 1 II I M II 

1631 CAGTTTTCCTTTTTTTAAACTTAAATAAATT 1661 

Table 5. BLASTP identity search for the protein of the invention versus Non- 
Redundant Composite and GenSeq for the Peptidase (HPEP-8)-liice protein of the 
invention. 

50 >patp:Y41704 Human PR0351 protein sequence - Homo sapiens, 571 aa. (seq id 

NO: 61) 

Length = 571 

55 Plus Strand HSPs: 

Score = 1372 (483.0 bits). Expect = 1.5e-170, Sum P(2) « 1.5e-170 
Identities - 254/255 (99%) , Positives = 254/255 (99%) , Frame = +1 

60 Query- 322 QGDSGGPVLCLEPDGHWVQAGIISFASSCAQEDAPVLLTNTAAHSSWLQARVQGAAFLAQ 501 

IIMIIMIIIIIIIIIIMIMIIIIIIIIMIMIIMMMIIIMMIIIIIIIll 

Sbjct: 239 QGDSGGPVLCLEPDGHWVQAGIISFASSCAQEDAPVLLTNTAAHSSWLQARVQGAAFLAQ 298 
Query: 502 SPETPEMSDEDSCVACGSLRTAGPQAGAPSPWPWEARLMHQGQLACGGALVSEEAVLTAA 681 

65 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 M 
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10 



40 



Sb j c t : 2 9 9 S PETPEMSDEDSCVACGSLRTAGPQAGAPS PWPWEARLMHQGQLACGGAIjVSEEAV1.TAA 3 58 
Query: 682 HCFIGRQAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRP 861 

lllllllllllllllllllllillllllMIIIIIIIIIIIIIIIIIMIIIIIIIIIII 

Sbjct: 359 HCFIGRQAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRP 418 
Query: 862 LCLPYADHHLPDGERGWVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPG6DGSPILP 1041 

Mill IIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

Sbjct: 419 LCLPYPDHHLPDGERGWVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILP 478 

Query: 1042 GMVCTSAVGELPSCE 1086 

lllllllllllllll 
Sbjct: 479 GMVCTSAVGELPSCE 493 



15 Score o 315 (110.9 bits), Expect = 1.5e-170, Sum P(2) = 1.5e-170 
Identities = 56/56 (100%), Positives = 56/56 (100%), Frame = +i 

Query: 4 DTSDAPGTLRNLRLRLISRPTCNCIYNQLHQRHLSNPARPGMLCGGPQPGVQGPCQ 171 

IIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

20 Sbjct: 184 DTSDAPGTLRNLRLRLISRPTCNCIYNQLHQRHLSNPARPGMLCGGPQPGVQGPCQ 239 

Score = 225 (79.2 bits). Expect = 8.7e-15, P = B.7e-15 
Identities = 71/203 (34%), Positives = 95/203 (46%), Frame = +1 

25 Query: 586 PSPWPWEARLMHQGQLACGGALVSEEAVLTAAHCFIGRQAPE- -EWSVGLGT RP 741 

I lll + j + II I l-lk^ llllllll I I III II- I 

Sbjct: 63 PGEWPWQASVRRQGAHICSGSLVADTWVLTAAHCFEKAAATELNSWSWLGSLQREGLSP 122 

Query: 742 - -EEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRPLCLPYADHHLPDGERGWV 915 

30 II k I I II I I hill II I I Mill III I 

Sbjct: 123 GAEEVGVAALQLPRAYNHYSQGSDLALLQLAHPTTH TPLCLPQPAHRFPFGASCWA 178 

Query: 916 LGRARPGAGI -SSLQTVPVTLLGPRACS RLHTU^PGGDGSPILPGMVCTSAVGELPS 1080 

I ^ + +1+ + + 1+ 1+ +M + I II hi II 

35 Sbjct: 179 TGWDQDTSDAPGTLRNLRLRLISRPTCNCIYNQLHQRHLSN"-PARPGMLCG---GPQPG 233 



Query: 1081 CEANQPAADRGPGHSQEQENAGRQMALLPLSS 1176 

II I - I H 

Sbjct: 234 VQGPCQGDSGGPVLCLEPDGHWVQAGIISFAS 265 

Score = 102 (35.9 bits), Expect = 7.2e-32, Sum P(2) = 7.2e-32 
Identities = 27/84 (32%), Positives = 42/84 (50%), Frame « +1 



Query- 295 SVLGFVAWLQGDSGGPVLCLEPDGHWVQAGIISFASSCAQEDAPVLLTNT7VAHSSWLQAR 474 

45 I +1 ^ +1 II I I I II Ih M +1 I I h k + 

Sb j Ct : 484 SAVGELPSCEGLSGAP - LVHEVRGTWFLAGLHSFGDACQGPARPAVFTALPAYEDWVSS - 541 

Query: 475 VQGAAFLAQSPETPEMSDEDSCVA 546 

+ h II II Ihl 
50 Sbjct: 542 LDWQVYFAEEPE-PE-AEPGSCLA 563 

Table 6. BLASTN identity search (versus the human SeqCalling database for the Peptidase (HPEP-8)-like protein of the 
invention. 

>s3aq: 132854740 Category D: 12 frag (12 non-5 • sig-CG) , 636 bp. (SEQ ID NO:62) 
55 Length =636 

Minus Strand HSPs : 

Score =3 1423 (213.5 bits). Expect =» 7.0e-59, P = 7.0e~59 
60 Identities =s 313/343 (91%), Positives = 313/343 (91%), Strand = Minus / Plus 
Query: 1001 AGCCGGCTGCAG-GCCCTAGGCCCCAGGAGGGTCACGGGCACTGTCTGGAGGGAGCTGAT 943 

III IMIII I III I Ml II II II III I Mil 

Sbjct: 295 AGCTGGCTGCCCCGGCCT-GCAGGTTGGATGCSACAGCAGCCCTGGCCCT-GTGCCCACCT 352 
65 Query: 942 GCCTGCTCCTGGGCGGGCCCGTCCCAGAACCCAGCCACGCTCCCCATCAGGCAGGTGGTG 883 

lllllllllllllllilllllllllMIIIIIIIIIIIII Mill Mill llllllll I 

Sbjct: 353 ACCTGCTCCTGGGCGGGCCCGTCCCAGAACCCAGCCACGCTCCCCATCAGGCAGGTGGTG 412 
Query: 88 2 GTCAGCATAGGGCAGGCAGAGGGGCCGCAGGCTGGCTCCCAGTGTCACAGGCTGGGCCAG 823 

70 II I II I M II II I II II 1 1 1 II I M II II I II I M II I II II I II i II II M i I II II I 
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413 GTCAGGATAGGGCAGGCAGAGGGGCCGCAGGCTGGCTCCCAGTGTCACAGGCTGGGCCAG 472 

CAGCAGGAGGGCCATGTCGTAGCCCCCCTCAGGGTGGGTGTAGGCTCCATGCAGGATGAG 763 

IIIIIIIIMIIIIIIIIIIMIIIIIIMIIIMIIIIIIIMMIMMMMMIII 

CAGCAGGAGGGCCATGTCGTAGCCCCCCTCAGGGTGGGTGTAGGCTCCATGCAGGATGAG 532 
CTGCTTCAGGCCCCACTCCTCCGGTCTGGTCCCCAGCCCTACGCTCCATTCCTCTGGGGC 703 

IIIIIIIIIIIMMIIMIMIIIIIIIIIMIIIIMIIIIIMIIIIIIIIIIIIII 

CTGCTTCAGGCCCCACTCCTCCGGTCTGGTCCCCAGCCCTACGCTCCATTCCTCTGGGGC 592 

CTGGCGCCCAATGAAGCAGTGGGCAGCAGTTAGCACCGCCTCCT 659 

llllllllllllllllllllllllllllllllllllllllllll 
CTGGCGCCCAATGAAGCAGTGGGCAGCAGTTAGCACCGCCTCCT 636 

(113-6 bits). Expect = 1.7e-28, P = 1.7e-28 
3 165/179 (92%), Positives « 165/179 (92%), Strand « Minus / Plus 

AGGTCCCCTGTCAGCT^GCTGGTTGGTTGGCCTCACAGCTGGGCAGCTCACCCACAGCACT 1 0 5*3 

Mil III I INI II I lllllllllllllllilllllllllllllll 

AGGTAAGGTGTGGGGGCCTGG- -GGCTCACCTCACAGCTGGGCAGCTCACCCACAGCACT 162 

GGTACACACCATCCCCGGCAGAATAGGGCTGCCATCACCCCCAGGAGCTGCATGCAGCCG 997 

llllllllllllllllllllllllllllllllllllllllllllll llllllllllllll 
GGTACACACCATCCCCGGCAGAATAGGGCTGCCATCACCCCCAGGAGCTGCATGCAGCCG 222 

GCTGCAGGCCCTAGGCCCCAGGAGGGTCACGGGCACTGTCTGGAGGGAGCTGATGCCTG 938 

lllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
GCTGCAGGCCCTAGGCCCCAGGAGGGTCACGGGC7VCTGTCTGGAGGGAGCTGATGCCTG 281 

>s3aq: 134913963 Category E: 1 frag (1 non-CG EST), 415 bp. 
Length ~ 415 

Plus Strand HSPs: 

Score = 297 (44.6 bits). Expect = l.le-06. P = l.le-06 

Identities = 61/63 (96%), Positives = 61/63 (96%), Strand = Plus / Plus 





Sbjct : 


413 




Query: 


822 


5 


Sbjct: 


473 




Query: 


762 


10 


Sbjct: 


533 




Query: 


702 




Sbjct: 


593 


15 


Score 


=> 757 




Identities ' 




Query: 


1116 


90 


Sbjct : 


105 




Query: 


1056 


25 


Sbjct: 


163 




Query: 


996 




Sbjct: 


223 



Que ry 13 8 5 TTGTTTTGAAAATTTCTTTTTTTGGGGGGCAGCAGTTTTCCTTTTTTTAAACTTAAATAA 1444 

40 III I llllllllllllllllllllllllllllllllllllllllllllllllllllll 

Sbjct: 10 TTGGTGTGAAAATTTCTTTTTTTGGGGGGCAGCAGTTTTCCTTTTTTTAAACTTAAATAA 69 

Query: 144 5 ATT 1447 
III 

45 Sbjct: 70 ATT 72 
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Table 7. ClustalW alignment of the protein of the invention with similar peptidase 
(HPEP-8)s. 



ClustalW alignment of the protein of the invention. 

CO50817-04 

Y41704 MLLSS LV S L AOSV Y LAWl LF F VLYDFC I VC I T T YAI NVS LMWLS F RKVQEPQOKAKR HON 

Y90291 

CO50817-04 

Y41704 T VPO EWPWQASVRRQOAH I C SOS LVADT WV LT AAHC F EKAA ATE LNSWSVVLOSLQR EOL 

Y90291 

CO508 17-04 - 

Y41704 S POA E EVOVAALQL P RA YNH Y SQOSDLA L LQ LAHPTTHTP L C LP Q P AHRFP FOASCWAT O 

Y90291 

CO50817-04 

Y41704 WDQDT SDAPOTLRNLRLRLI SRPTCNCI Y NQ LHQRHLSNP A R PGMLCOOPQPOVQO P CQO 

V90291 

CG508 17-04 

y41704 DSOOP VLCLEPDOHWVQAOI I SF ASSC A Q ED A P VLLTNTAA HS SWLQ ARVQOAAF LA QSP 

Y90291 

CG50817434 - - - - 
Y41704 ETP E 
Y90291 

CG50817-04 
Y41704 
Y90291 

CO508174D4 
Y41704 
Y90291 

CO50817-04 Jci^WiadiW J^>i3l gN O P HAD R O P O ijE0DEOEN AHRQMALfClPLSSiaPCHV 

Y41704 
Y90291 

CO50817-04 
Y41704 
Y90291 



vISDEDSCVACGS LRTAGPQAGAPS P WP WE ARLMHQGQ L AC 0 GA L VSE EAV LT A A HC 
MSDEDSCVACGS LRTAGPQAGAPS P WP WEARLMHQOQ L AC O GA L VSE EAV LT A A HC 
MSDEDSCVACGS LRTAGPQAGAPS P WP WE ARLMHQGQ L AC G OA L VSE EA V LT A A H C 



F IGRQAP EEWSVGLGTRPEEWGLKQLI L H G A Y TH P EGGYDM A LL L LAQ PVTLGAS LR P LC 
F IGRQAP EEWSVGLGTRPEEWGLKQLI L H G AY TH P EOOYDMA LL L LAQ PVTLGAS LR PLC 
F IGRQAP EEWSVGLGTRPEEWGLKQLI L H O A Y TH P EGG YDM A L L L L AQ PVT LGAS LR P LC 



LPYADHHLPDGERGWV LGRARPGAGISS L Q T V P V T L LGP R A C S R L H A A PGGDGS P I L P GM 
LPYQDHHLPDGERGWV LGRARPGAOISS L Q T V P VTLLGP RA C S R LH A A PGGDGS P I L PGM 
LPYA DHH LP DGERGWV LGRARPGAGISS L Q T V P VT LLGP RA C S R LHA A PGGDGS P I L PGM 



VCTS AVGELPSCEi_, 

VCTS A VGELPSCEGLSGAPLVHEVRGTWF L AG LHS F GDACQ O P A R P A V FTALP A YE D WV S 
VCTS AVGELPSCEGLSGAPLVHEVRGTWF L AG LHS FGDACQ G P A R P A V FTALP AYEDW^/ S 



SLDWQ VY FAEEPEP E A E P GS C LAN I SQ P T SC 
SLDWQ VYF AEEPEP E A E P GS C LAN I SQ P T SC 



Information for the ClustalW proteins: 

Accno Common Name Length 

CG508 1 7-04 novel Peptidase (HPEP-8)-like protein 

Y4 1 704 Human PROS 5 1 protein sequence. 57 1 

Y90291 Human peptidase, HPEP-8 protein sequence. 267 



In the alignment shown above, black outlined amino acid residues indicate regions of 
conserved sequence (i.e., regions that may be required to preserve structural or functional 
properties); greyed amino acid residues can be mutated to a residue with comparable steric 
10 and/or chemical properties without altering protein structure or function (e.g. L to V, I, or M); 

non-highlighted amino acid residues can potentially be mutated to a much broader extent without 
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altering structure or function. Psort, Signal? and hydropathy results for the Peptidase (HPEP-8)- 
like protein of the invention. 

Table 8. Psort, Signal P and Pfam Results for CG508 17-04, Peptidase (HPEP-8)-like 
Protein. 

5 

PSORT data: 

cytoplasm — Certajnty=0.4500( Affirmative) < suco 
microbody (peroxisome) — Certainty=0.3000(Affirmative) < succ> 
lysosome (lumen) — Certainty=0.2415(Afrimiative) < succ> 
1 0 mitoctiondrial matrix space — Certainty=0. 1 000(Affimia«ve) < suco 

Signal P data: 

# Measure Position Value Cutoff Conclusion 
max. C 57 0.130 0.37 NO 
15 max. Y 55 0.066 0.34 NO 
max. S 32 0.311 0.88 NO 
means 1-54 0.142 0.48 NO 

PFAM data: 

20 Scores for sequence family classification (score includes all domains): 

Model Description Score E-value N 

trypsin Trypsin 69.7 2.7e-21 1 

25 

SECP12 

A SECP12 nucleic acid and polypeptide according to the invention includes the 
nucleic acid sequence (SEQ ID NO:44) and encoded polypeptide sequence (SEQ ID NO:45) of 
clone CG508 17-05 directed toward novel peptidase (HPEP-8)-like proteins and nucleic acids 

30 encoding them. This is a related variant of SECPl 1, clone CG508 17-04. Figure 17 illustrates 
the nucleic acid sequence and amino acid sequences respectively. This clone includes a 
nucleotide sequence (SEQ ID NO:44) of 1592 bp. The nucleotide sequence includes an open 
reading frame (ORF) beginning with an ATG initiation codon at nucleotides 19-21 and ending 
with a TGA codon at nucleotides 1582-1584. The encoded protein having 521 amino acid 

35 residues is presented using the one-letter code in Figure 17. 

The protein encoded by clone CG508 17-05 is predicted by the PSORT program to 
localize in the plasma membrane with a certainty of 0.6850, and appears to be a signal protein 
(see Table 13 below). 
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The sequence identified by exon linking was extended in silico using information from at 
least some of the following sources: SeqCalling assemblies 153687026, 152507187, 153485867, 
153485864 and genomic clone gb_AC009088.5 . 

The genomic clone was analyzed by Genscan, Grail and/or other programs to identify 
5 regions that were putative cxons, i.e., putantive coding sequences. The clone was also analyzed 
by TBLASTN, TFASTN, TFASTA, BLASTX and/or other programs, i.e., hybrid to identify 
genomic regions translating to proteins with similarity to the original protein or protein family of 
interest. The following genomic sequence was thus included in the invention: gb_AC009088.5 . 

The DNA sequence and protein sequence for a novel Peptidase-like gene or one of its 
10 splice forms thus derived is reported here as the invention CG508 17-05. Genomic clones having 
regions with 100% identity to the extended sequence thus obtained were identified by BLASTN 
searches with the extended sequence against human genomic databases. The genomic clone was 
selected for further analysis because this identity indicates that these clones contain the genomic 
locus for these SeqCalling assemblies. 

1 5 The regions defined by all approaches were then manually integrated and manually 

corrected for apparent inconsistencies that may have arisen, for example, from miscalled bases in 
the original fragments used, or from discrepancies between predicted homolgy to a protein of 
similarity to derive the final sequence of the invention CG508 17-05 reported here. When 
necessary, the process to identify and analyze SeqCalling assemblies, ESTs and genomic clones 

20 was reiterated to derive the full length sequence. 

Similarities 

In a search of sequence databases, it was found, for example, that the nucleic acid 
sequence of this invention has 1 135 of 1 140 bases (99%) identical to a gb:GENBANK-ID: 
Z34002 human PR0351 nucleotide sequence mRNA from Homo (Table 9). The full amino 
25 acid sequence of the protein of the invention was found to have 476 of 493 amino acid residues 
(96%) identical to, and 479 of 493 amino acid residues (97%) similar to, the 571 amino acid 
residue patp:Y4i704 human PR0351 protein from Homo sapiens (Table 10). 

A multiple sequence alignment is given in Table 12, with the protein of the invention 
being shown on the first line in a ClustalW analysis comparing the protein of the invention with 
30 related protein sequences. 
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The presence of identifiable domains in the protein disclosed herein was determined by 
searches using algorithms such as PROSITE, Blocks, Pfam, ProDomain, Prints and then 
determining the Interpro number by crossing the domain match (or numbers) using the Interpro 
website. The results indicate that this protein contains the following protein domains (as defined 
5 by Interpro) at the indicated positions: domain name trypsin at amino acid positions 61 to 279, 
and 312 to 476. This indicates that the sequence of the invention has properties similar to those 
of other proteins known to contain this/these domain(s) and similar to the properties of these 
domains. 

Chromosomal information: 

10 The Peptidase disclosed in this invention maps to chromosome 16. This information was 

assigned using OMIM, the electronic northern bioinformatic tool implemented by CuraGen 
Corporation, public ESTs, public literature references and/or genomic clone homologies. This 
was executed to derive the chromosomal mapping of the SeqCalling assemblies. Genomic 
clones, literature references and/or EST sequences that were included in the invention. 

1 5 Tissue expression 

The Peptidase disclosed in this invention is expressed in at least the following tissues: 
Adrenal gland, bone marrow, brain - amygdala, brain - cerebellum, brain - hippocampus, brain - 
substantia nigra, brain - thalamus, brain -whole, fetal brain, fetal kidney, fetal liver, fetal lung, 
heart, kidney, lymphoma - Raji, mammary gland, pancreas, pituitary gland, placenta, prostate, 
20 salivary gland, skeletal muscle, small intestine, spinal cord, spleen, stomach, testis, thyroid, 

trachea, uterus. This information was derived by determining the tissue sources of the sequences 
that were included in the invention including but not limited to SeqCalling sources. Public EST 
sources, and/or RACE sources. 

Cellular Localization and Sorting 

25 The SignalP, Psort and/or Hydropathy profile for the Peptidase-like protein are shown in 

Table 13. The results predict that this sequence has a signal peptide with a cleavage site between 
positions 35 and 36 and is likely to be localized at the plasma membrane with a certainty of 
0.6850. 
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Functional Variants and Homologs 

The novel nucleic acid of the invention encoding a Peptidase-like protein includes the 
nucleic acid whose sequence is provided in Figure 1 7, or a fragment thereof The invention also 
includes a mutant or variant nucleic acid any of whose bases may be changed from the 
5 corresponding base shown in Figure 17, while still encoding a protein that maintains its 

Peptidase-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences arc complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 

10 complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of non-limiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least in 
part to enhance the chemical stability of the modified nucleic acid, such that they may be used, 
for example, as antisense binding nucleic acids in therapeutic applications in a subject. In the 

15 mutant or variant nucleic acids, and their complements, up to about 1% of the residues may be so 
changed. 

The novel protein of the invention includes the Peptidase-like protein whose sequence is 
provided in Figure 17. The invention also includes a mutant or variant protein any of whose 
residues may be changed from the corresponding residue shown in Figure 1 7 while still encoding 
20 a protein that maintains its Peptidase-like activities and physiological functions, or a functional 
fragment thereof. In the mutant or variant protein, up to about 4% of the bases may be so 
changed. 

Antibodies 

The invention further encompasses antibodies and antibody fragments, such as Fab, 
25 (Fab)2 or single chain FV constructs, that bind immunospecifically to any of the proteins of the 
invention. Also encompassed within the invention are peptides and polypeptides comprising 
sequences having high binding affinity for any of the proteins of the invention, including such 
peptides and polypeptides that are fused to any carrier partcle (or biologically expressed on the 
surface of a carrier) such as a bacteriophage particle. 
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Uses of the Compositions of the Invention 

The protein similarity information, expression pattern, and map location for the 
Peptidase-like protein and nucleic acid disclosed herein suggest that this Peptidase may have 
important structural and/or physiological functions characteristic of the Serine protease family. 
5 Therefore, the nucleic acids and proteins of the invention are useful in potential diagnostic and 
therapeutic applications and as a research tool. These include serving as a specific or selective 
nucleic acid or protein diagnostic and/or prognostic marker, wherein the presence or amount of 
the nucleic acid or the protein are to be assessed, as well as potential therapeutic applications 
such as the following: (i) a protein therapeutic, (ii) a small molecule drug target, (iii) an antibody 
10 target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), (iv) a nucleic acid useful in 
gene therapy (gene delivery/gene ablation), and (v) a composition promoting tissue regeneration 
in vitro and in vivo (vi) biological defense weapon. 

The nucleic acids and proteins of the invention are useful in potential diagnostic and 

therapeutic applications implicated in various diseases and disorders described below and/or 
1 5 other pathologies. For example, the compositions of the present invention will have efficacy for 

treatment of patients suffering from: cell proliferative disorder; arteriosclerosis; psoriasis; 

myelofibrosis; cancer; autoimmune disorder; Crohn's disease; inflammatory disorder; AIDS; 

anaemia; allergy; asthma; atherosclerosis; Grave's disease; multiple sclerosis; scleroderma; 

infection; diabetes; metabolic disorder; Addison's disease; cystic fibrosis; glycogen storage 
20 disease; obesity; nutritional edema, hypoproteinemia and other diseases, disorders and conditions 

of the like. 

These materials are further useful in the generation of antibodies that bind 
immunospecifically to the novel substances of the invention for use in therapeutic or diagnostic 
methods. 

25 

Table 9. BLASTN identity search for the nucleic acid of the invention. 

>patn:Z34002 Human PR0351 nucleotide sequence - Homo sapiens, 2365 bp. (Seq id 

NO: 63) 

30 Length = 2365 

Plus strand HSPs: 

Score = 5649 (847.6 bits). Expect = 4.3e-288. Sum P(2) = 4.3e-288 
35 Identities = 1135/1140 (99%), Positives = 1135/1140 (99%), Strand = Plus / Plus 
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Que ry ; 34 0 TCCTGCGTGAGGGACTCAGCCCCTGGGGCCGAAGAGGTGGGGGTGGCTGCCCTGCAGTTG 399 

I I llllllllllllllllll lllllllllllllllllllllllllllllllllllll 
Sbjct: 639 TGCAGCGTGAGGGACTCAGCCC - TGGGGCCGAAGAGGTGGGGGTGGCTGCCCTGCAGTTG 697 

Query: 400 CCCAGGGCCTATAACCACTACAGCCAGGGCTCAGACCTGGCCCTGCTGCAGCTCGCCCAC 459 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
Sbjct: 698 CCCAGGGCCTATAACCACTACAGCCAGGGCTCAGACCTGGCCCTGCTGCAGCTCGCCCAC 757 

10 Query: 460 CCCACGACCCACACACCCCTCTGCCTGCCCCAGCCCGCCCATCGCTTCCCCTTTGGAGCC 519 

llllllllllllllllllllllllllllllllllllllllllllll llllllllllllll 
Sbjct: 758 CCCACGACCCACACACCCCTCTGCCTGCCCCAGCCCGCCCATCGCTTCCCCTTTGGAGCC 817 

Query: 520 TCCTGCTGGGCCACTGGCTGGGATCAGGACACCAGTGATGCTCCTGGGACCCTACGCAAT 579 

IS IIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

Sbjct: 818 TCCTGCTGGGCCACTGGCTGGGATCAGGACACCAGTGATGCTCCTGGGACCCTACGCAAT 877 

Query: 580 CTGCGCCTGCGTCTCATCAGTCGCCCCACATGTAACTGTATCTACAACCAGCTGCACCAG 639 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
20 Sbjct: 878 CTGCGCCTGCGTCTCATCAGTCGCCCCACATGTAACTGTATCTACAACCAGCTGCACCAG 937 

Query: 640 CGACACCTGTCCAACCCGGCCCGGCCTGGGATGCTATGTGGGGGCCCCCAGCCTGGGGTG 699 

Ijllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
Sbjct: 938 CGACACCTGTCCAACCCGGCCCGGCCTGGGATGCTATGTGGGGGCCCCCAGCCTGGGGTG 997 

Query: 700 CAGGGCCCCTGTCAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGACAC 759 

IIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

Sbjct: 998 CAGGGCCCCTGTCAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGACAC 1057 
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30 Query: 760 TGGGTTCAGGCTGGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTGTG 819 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
Sbjct: 1058 TGGGTTCAGGCTGGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTGTG 1117 

Query: 820 CTGCTGACCAACACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGGGGCAGCT 879 

35 I 1 1 1 1 I 1 1 I I I I 1 1 I I I I I 1 1 I I I I I I I I I I I II I I II I I I I I I 1 1 I I I I 1 1 1 I I I I II I 

Sbjct: 1118 CTGCTGACCAACACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTC7VGGGGGCAGCT 1177 

Query: 880 TTCCTGGCCCAGAGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCTGT 939 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
40 Sbjct: 1178 TTCCTGGCCCAGAGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCTGT 1237 

Query: 94 0 GGATCCTTGAGGACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGGCC 999 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIMIIIIII 
Sbjct: 1238 GGATCCTTGAGGACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGGCC 1297 

45 

Query: 1000 AGGCTGATGCACCAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGGTG 1059 

IIIIIIIIIMIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 
50 Sbjct: 1298 AGGCTGATGCACCAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGGTG 1357 

Query: 1060 CTAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGCTG 1119 

lllllllllllllllllllllllllllllllllll lllllllllll llllllllllllll 
Sb j c t : 1358 CTAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGCTG 1417 

Query: 1120 GGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTACACCCAC 1179 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
Sbjct: 1418 GGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTACACCCAC 1477 
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60 Query: 1180 CCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACTGGGAGCC 123 9 

IIIIIIIIIIIIMMIIIIIIIIIIIIIIIMIMIMIIIIIIIIMIIIIIIIIIII 

Sb j C t : 1478 CCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACTGGGAGCC 1537 
65 Query: 124 0 AGCCTGCGGCCCCTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGGGAGCGTGGC 1299 

Illllllllllililllllllllllli llllllllllllllllllllllllllllllll 

Sbjct: 1538 AGCCTGCGGCCCCTCTGCCTGCCCTATCCTGACCACCACCTGCCTGATGGGGAGCGTGGC 1597 
Query: 1300 TGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCGTG 1359 

70 I 1 1 1 1 1 II I 1 1 1 1 1 1 1 1 1 I i 1 1 1 1 1 I I I 1 1 1 I II 1 1 1 1 I I I II I 1 1 1 1 1 I I I I I 1 1 1 1 1 I 

45 
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60 



65 



70 



Sbjct: 1598 TGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCGTG 1657 

ACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCAGC 1419 

IIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIII 

ACCCTCCTGGGGCCTAGGGCCTGCAGCCX3GCTGCATGCAGCTCCTGGGGGTGRTGGCAGC 1717 

CCTATTCTGCCGGGGATGGTGTGTACCAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGGCC 1479 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllll I 
CCTATTCTGCCGGGGATGGTGTGTACCAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGGGC 1777 

(142.2 bits), Expect = 3.0e-74, Sum P(2) « 3.0e-74 

. 882/1448 (60%), Positives = 882/1448 (60%), Strand = Plus / Plus 



TCACCACCTATGCTATCAACGTGAGCCTGATGTGGCTCAGTTT-CCGGAAGGTCCAAGAA 168 

IS ' llllll MM I llllll llllll I II I II I I 

TGACCTCATCTGCTTTGCTT - TGGTCTTCAAGCCGCTCAGCGTGCCTGT - GGACAGCGTG 443 

CCCCAGGGCCAACCCAAGCCTCAGGAGGGCAACACAGTCCCTGGCGAGTGGCCCTGGCAG 228 

Ml M II IIIIIIIIIIIIIIIIIMIIIIIIIIIIIMIMMIIIIIMIMI 

20 Sbjct: 444 GCCCCGGCCCC-CCCAAGCCTCAGGAGGGCAACACAGTCCCTGGCGAGTGGCCCTGGCAG 502 

GCCAGTGTGAGGAGGCAAGGAGCCCACATCTGCAGCGGCTCCCTGGTGGCAGACACCTGG 288 

MMMIMMMIMMMMMMMMMMMMMMIMMM lllllllllll 

GCCAGTGTGAGGAGGCAAGGAGCCCACATCTGCAGCGGCTCCCTGGTGGCAGACACCTGG 562 



Sb j ct : 


1598 


Query: 


1360 


Sbjct: 


1658 


Query: 


1420 


Sbjct: 


1718 


Score 


= 948 


Identities = 


Query : 


110 


Sbj ct : 


386 


Query : 


169 


Sbj ct : 


444 


Query: 


22 9 


Sbj ct : 


503 


Query: 


289 


Sbj ct : 


563 


Query: 


349 


Sbj ct : 


622 


Query : 


406 


Sbj ct : 


O / D 


Query : 


A CO 


Sbj ct : 




Query : 




Sbj ct : 


787 




574 


QhH - 
otfj t. ti . 


847 


Query: 


632 


Sbjct: 


899 


Query: 


687 


Sbj ct : 


958 


Query: 


743 


Sbj ct : 


1016 


Query: 


799 


Sbjct: 


1071 


Query: 


855 


Sbjct: 


1127 


Query: 


908 



GTCCTCACTGCTGCCCACTGCTTTGAAAAGGCAGCAGCAACAGAACTGAATTCCTGCGTG 348 

IIIIIIIIIIIIIIIIIMIIIIIIIMIMMIIIIIIIIMIIIIMIIIIIII II 

GTCCTCACTGCTGCCCACTGCTTTGAAAAGGCAGCAGCAACAGAACTGAATTCCTG - GTC 621 



30 Query: 349 AGGGACTCAGCCCCTGGGGCCGAAG-AG-GTGGGGGTGGCTGCCCTGCAGTTGCCCAGG- 405 

II I II I MM I I II III III I llllll I I III 

AGTGG -TC C- TGGGTTCTCTGCAGCGTGAGGGACTCAGCCCTGGGGCCGAAGAGGT 675 

GCCTATAACCACTACAGCCAGG-GCTCAGA-CCTGGCCCTGCTGCAGCTCGC-C-C7VCCC 461 

35 ~ I I I I I II II II III III I II MM I I II I 



CACGACCCACACACCCCTCTGCCTGCCCCAGCCCGCCCATCGCTTCCCCTTTGGA - GCCT 520 

I I II I I III III IIIM I MM I I Mill II III 

40 Sbjct: 734 CTGGCCCTGCTG-CAGCTC-GCCCACCCCA- -CGACCCA-CACA-CCCCTCTGCCTGCC- 786 



CCTGCTGGGCCACTGGCTGGGATCAGGA- -CACCAG-TGATGCTC CTGGGACCCT-A 573 

II II MM I I I III I II Ml II llllll I I 



I III MM I II I I III Ml Ml III II I II 

;tgatgctcctgggaccc-t-a- -CGCAAT-C-TGCGCCTGCGTCT-CATC-AGT 898 



50 Query: 632 TGCACCAGCGACACCTGTC-CAAC- -CCGGCCCGGCCTGGGATGCTATGTGGGGGCC- -C 686 

M Ml I MM I II II II II I II I III II I 

C6CCCCACAT6TAACTGTATCTACAACCAGCTGCACCAGCGACACC - TGTCCAACCCGGC 957 
CCAGCCTGGGGTGC - A - G - GGCCCCTGTCAGGGAGAT - TCCGGGGGCCCTGTGCTGTGCC 742 

55 ' II lllllll MM III I III I II lllllllll II I 

JGCCTGGGATGCTATGTGGGGGCCCCCAGCCTGGGGTGCAGGGCCCCTGT - CAGGGA - 1015 



II M II MM II MM III I I I I II I II MM 



GCC-CAGGAGGAC-GCTCCTGTGCTGCTGACCAACACAGCTGCTCACAGTTC- - CTGGCT 8 54 

II II II II I Mil II III I I MM I II I III I 



-CA- -G- -GCTCG-AGTTCAGGGG-GCAGCTTTCCTGGCCCAGAGCCCAGAGACCCCGG 907 

II I MM Mill II MM I I I MM Ml II II 
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1 1 



I I I I II 



I II II II I 
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20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



Sb j Ct : 


1185 


Query: 


963 


Sbjct : 


1244 


Query: 


1021 


Sb j ct : 


1297 


Query: 


1079 


Sbjct: 


1352 


Query: 


1134 


Sb j ct : 


1412 


Query : 


1191 


Sbjct: 


1468 


Query: 


1248 


Sb j ct : 


1526 


Query: 


1305 


Sbjct: 


1583 


Query: 


1365 


Sb j ct : 


1639 


Query: 


1423 


Sbjct: 


1698 


Query: 


1478 


Sbjct: 


1756 


Query: 


1537 


Sbjct: 


1811 


Score 


= 894 


Identities = 


Query: 


1 


Sb j ct : 


171 


Query: 


61 


Sbjct: 


231 


Query: 


121 


Sbjct: 


291 


Query: 


181 


Sbjct: 


351 



CC - AGGCAGGAGCACCCTCCCCATGGCCCTGGGAGG - CCAGGCTGATGCACCAGGGACAG 1020 

III I nil mill III nil n i in n nil 

TTGAGG - AC - AGCAGG - TCCCCA- GGCA GGAGCACCCTCCCCATGGCCCTGGGAGGC 1296 



I III II II II II II I 



I inn II 



II II I 



TCATTGGGCGCCAG - GCCC - CAGAGGAATGGAGCGT - AGGGCTG - G - GGACCAGACCGGA 1133 

I II I nil II III III III I I III n II I 

GCGGTGCTAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTA 1411 
GGAGTGGGG - CCTGAAGCAGCTC7V - TCCTGCATGGAGCCTACACCCACCCTG - AGGGGGG 1190 

II inn II II M II II II III I I II nil i ii i 

GGGCTGGGGACCAGAC-CGGAGGAGTGGGGCCTGAAGC--AG-CTCATCCTGCATGGAGC 1467 
CTACGACATGGCCCTCCTGCTG - CTGGCCCA-GCCTGTGACACTGGGAGCC - AGCCTGCG 1247 

MM II MM I I II Mill III Ml MUM I 

CTAC-ACCCA-CCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTG 1525 



I I 



III III 



II I II I llllll II II I 



I II I 



1364 



TCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCRGACAGTGCCCGTGACCCT 

MM I II I I M I II I II I I MM II I I I I 

GATGGG - - GAGCXSTGGCTGGGTTCTGGGACGGGCCCGC- CCAGG- AGCAGGCATCAGCTC 1638 

CCTGGGGCCTAGGGCCTGC-AGCCGGCTGCATGC-AGCTCCTGGGGGTGATGGCAGCCCT 1422 

III I I II III I I II III I 11 nil 11 III I 

CCTCCAGAC - AGTGCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCTGCATGCAG 1697 

ATTCTGCCGGGGATGGTGTGTACCAGT- -GCTGTGGGTGAGCTGC-CCAG- -CTGTGAGG 1477 

I III II inn I I I I II I II n n iin inn ii 

CTCCTGGGGGTGATGGCA - GCCCTATTCTGCCGGGGATGGTGTGTACCAGTGCTGTG - GG 1755 



CATTCTCAGGAACAAGAGAATGCAGGC 
II I I II II III II III 



1536 



1810 



CCAACCAACCAGCTGCTGACAGGGGACCTGGC 

I I iiinii III I nil 



nil nil 11 III 



(134.1 bits). Expect = 4.3e-288, Sum P(2) » 4.3e-288 
> 182/186 (97%) , Positives = 182/186 (97%) , Strand = Plus / Plus 

CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGGTTCTGTCTAC 6 0 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGGTTCTGTCTAC 230 

CTGGCCTGGATCCTGTTCTTCGTGCTCTATGATTTCTGCATTGTTTGTATCACCACCTAT 120 

I lllllllllillllll llllllllllllllllllllllllllllllllll lllllllll 
CTGGCCTGGATCCTGTTCTTCGTGCTCTATGATTTCTGCATTGTTTGTATCACCACCTAT 290 

GCTATCAACGTGAGCCTGATGTGGCTCAGTTTCCGGAAGGTCCAAGAACCCCAGGGCCAA 180 

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiniiii I 

GCTATCAACGTGAGCCTGATGTGGCTCAGTTTCCGGAAGGTCCAAGAACCCCAGGGCAAG 350 



II 



Score = 699 (104.9 bits), Expect = 9.8e-60, Sum P(2) = 9.8e-60 

Identities = 391/603 (64%) , Positives ^ 391/603 (64%) , Strand = Plus / Plus 



70 Query: 990 CTGGGAGGCCAGGCTGATGCAC-CAGGGACAGCTGGCCTGTGGCGGAGC- -CCTGG- -TG 1044 

47 



10 



15 
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SbjCt : 


1508 


Query : 


1045 


Sbjct : 


1564 


Query: 


1102 


Sbjct: 


1623 


Query: 


1160 


Sbjct: 


1679 


Query : 


1220 


Sbj Ct : 


1734 


Query: 


1276 


Sbjct: 


1791 


Query: 


1334 


Sbjct: 


1849 


Query: 


1391 


Sbjct: 


1907 


Query: 


1450 


Sbjct: 


1961 


Query: 


1506 


Sbjct : 


2021 


Query: 


1566 


Sbjct: 


2080 



III I MM Ml II II I Mil II Mill III I MM I 

CTGCTGGCCCAGCCTG-T6-ACACTGGGA- -GCCAGCCTGCGGCCCCTCTGCCTGCCCTA 1563 



II II I 



III I I II I I III II I II Mill I II 



GAATGGAGCGTAGGGCTG6GGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCAT- -CC 1159 

II I II I I II llllll I III III I II I II 

GAGCAG -GCATCAG - CTCCCT - CCAGACAGTGCCCGTGAC- CCTCCTGGGGCCTAGGGCC 1678 

TGCATGGAGCCTACACCCACCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCC 1219 

MM I Mill II I I Mill I I II Mill I Mil II 

TGCA-GCCGGCTGCATGCAGC-TCCTGGGGGTGATGGCA--GCCCTATT-CTGCCGGGGA 1733 



I MM llllll I II MM III I Ml I Ml 



II 



1275 



CACC- -TGCCTGATGGGGAGCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCA 1333 

III III III II III I I Mill II Mill II III III 

CACTGGTGCATGA-GGTGAGGGGCACATGGTTCCTGGCCGGGCT-GCACAGCTTCGGAGA 1848 

T - CA- GCTCCCTCCA- GACAGTGCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGC 1390 

I I II II I III III i I I II I II MM I I I 

TGCTTGCCAAGGCCCCGCCAG - GCCGGCGGTCTTCACCGCGCTCCCTGCCTAT - GAGGAC 1906 



TGCATGCAGCTCCTGGGGGTGATGGCAGCCCTA 

II I Mil llllll III III 



■TTCTGCCGGGGATGGTGTGTACCAGT 1449 

III WW III III II 

:TTC-GCCGAGGAACCAGAGCCCGAG- i960 



GCTGTGGGTG-A-GCTGCCCAGCTGTGAG- -GCCAACCAACCAGCTGCTGACAGGGGACC 1505 

MM I II I llllll II I IIIIIIMIIIIIIIIIIIIIIIIIIIII 

GCTGAGCCTGGAAGCTGCCTGGCCAACATAAGCCAACCAACC7VGCTGCTGACAGGGGACC 2020 

TGGCCATTCTCAGGAACAAGAGAATGCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTC 1565 

lllllllllllllll llllllllllllllllllllllllllllllllllllllllllll 

TGGCCATTCTCAGGA - CAAGAGAATGCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTC 2079 

CCCACCCTGTCATGTGTGATTCCAGGC 1592 

lllllllllllllllllllllllllll 
CCCACCCTGTCATGTGTGATTCCAGGC 2106 



>patn:A37664 Human peptidase, HPEP-8 coding sequence 

(SEQ ID N0:64) 

Length = 1661 



Homo sapiens, 1661 bp. 



Plus Strand HSPs : 

50 Score = 3831 (574.8 bits). Expect = 5.6e-168, P = 5.6e-168 

Identities = 767/768 (99%) , Positives = 767/768 (99%) , Strand 



55 



60 



65 



70 



Plus / Plus 



Query: 


712 


Sbjct: 


320 


Query: 


772 


Sbjct: 


380 


Query: 


832 


Sbjct: 


440 


Query: 


892 


Sbjct : 


500 


Query: 


952 



CAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGACACTGGGTTCAGGCT 771 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
CAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGACACTGGGTTCAGGCT 379 

GGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTGTGCTGCTGACCAAC 831 

IIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

GGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTGTGCTGCTGACCAAC 439 

ACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGGGGCAGCTTTCCTGGCCCAG 8 91 

IIIIIMIIIINIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIMIIIIINI 

ACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGGGGCAGCTTTCCTGGCCCAG 499 

AGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCTGTGGATCCTTGAGG 951 

llllilllllllllllllllllllllMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

AGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCTGTGGATCCTTGAGG 559 

ACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGGCCAGGCTGATGCAC 1011 

IIIIIIIIIIIIIIIIIIIIIIIINIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIII 

48 





Sbjct : 


560 




Query: 


1012 


5 


Sbjct : 


620 




Query : 


1072 


10 


Sbjct: 


680 




Query: 


1132 




Sb j Ct : 


740 


15 


Query : 


1192 




Sbjct: 


800 


20 


Query : 


1252 


Sbjct: 


860 




Query: 


1312 


25 


Sbjct : 


920 




Query : 


1372 


30 


Sbjct: 


980 




Que ry : 


1432 




Sbjct : 


1040 


35 


Score 


= 974 




Identities « 




Query: 


546 


40 


Sbjct: 


1 




Query: 


606 


45 


Sbjct: 


61 




Query : 


666 




Sb j ct : 


121 


50 


Query : 


725 




Sbjct: 


181 


55 


Query : 


782 




Sbjct: 


238 




Query: 


837 


60 


Sbjct: 


295 




Query : 


890 


65 


Sbjct: 


352 




Query : 


947 




Sbjct : 


404 


70 


Query: 


1003 



560 ACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGGCCAGGCTGATGCAC 619 



CAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGGTGCTAACTGCTGCC 1071 

MMMIIMMIIIIIIIIIIIIIIIIIIIIIIMIIIIIMIIIIIIIIIIIIIIIII 

CAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGGTGCTAACTGCTGCC 679 

CACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGCTGGGGACCAGACCG 1131 

IIIIIIIIIIIIIIIIIIIIIMIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIII 

CACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGCTGGGGACCAGACCG 739 

GAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTACACCCACCCTGAGGGGGGC 1191 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

GAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTACACCCACCCTGAGGGGGGC 799 

TACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACTGGGAGCCAGCCTGCGGCCC 1251 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIII 

TACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACTGGGAGCCAGCCTGCGGCCC 859 

CTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGGGAGCGTGGCTGGGTTCTGGGA 1311 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

CTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGGGAGCGTGGCTGGGTTCTGGGA 919 

CGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCGTGACCCTCCTGGGG 13 71 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

CGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCGTGACCCTCCTGGGG 979 

CCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCAGCCCTATTCTGCCG 1431 

IIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIII 

CCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCAGCCCTATTCTGCCG 1039 



IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIII I 



(146.1 bits). Expect = 6.1e-39, P = 6.1e-39 
= 632/998 (63%), Positives « 632/998 (63%), Strand 



Plus / Plus 



GGACACCAGTGATGCTCCTGGGACCCTACGCAATCTGCGCCTGCGTCTCATCAGTCGCCC 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMMIIIIIIIIIIIIIIIII 

GGACACCAGTGAT6CTCCTGGGACCCTACGCAATCTGCGCCTGCGTCTCATCAGTCGCCC 

CACATGTAACTGTATCTACAACCAGCTGCACCAGCGACACCTGTCCAACCCGGCCCGGCC 

IIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 
CACATGTAACTGTATCTACAACCAGCTGCACCAGCGACTVCCTGTCCAACCCGGCCCGGCC 



IIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII III I 



605 



60 



665 



120 



II I III 



III I I II Mill I II I I Mil II 



- -CTTTGCATCA-AGCTGTGCCCAGGAGGACGCTCCTGTGCT-GCTGACCA-ACACAGC 836 

II I I Mill III III I II II I MM I I II I 



II II I I I I III Mill 



II III IMMI II I III III 



889 



351 



AGAGCCC-AGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCTGTGGATCCT- - 946 

Mill I MM III I M I II MM I III I II III 

CGAGCCTGACGGACACTGG - GTTCAG - GCTG - - G - CATCA - TC - AGCTT - TGCATCAAGC 4 03 



II I III III II III III I II 



I II II II II 
GCTGCTCACAGTTCCTGG 4 60 



49 



3. " J O V ■ hs- '& fcM a O a»ii J . I> 4 j! rli 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



70 



Sbjct : 


461 


Query: 


1062 


Sbjct: 


520 


Query : 


1120 


Sbjct: 


579 


Query : 


1178 


Sbjct: 


637 


Query: 


1235 


Sbjct : 


694 


Query : 


1292 


Sbjct: 


751 


Query: 


1350 


Sbj Ct : 


806 


Query: 


1404 


Sbjct: 


863 


Query: 


1460 


Sbjct: 


922 


Query : 


1519 


Sbjct: 


979 


Score 


= 706 


Identities = 


Query: 


990 


Sbjct: 


818 


Query: 


1047 


Sbjct: 


874 


Query: 


1103 


Sbjct: 


934 


Query: 


1161 


Sbj ct : 


990 


Query: 


1221 


Sbjct : 


1045 


Query: 


1277 


Sbjct: 


1102 


Query: 


1335 


Sbjct: 


1160 



Ml I II I II III III I I III! I lllll III I I 

CTGCAGGCTCGAGTT - CAGGGGGCAGCTTTCCTGGCCCAGAGCCCAGAGACCCCGGAGAT 519 

AACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAG - - CGTAGGGCTG 1119 

I II II lllll II I I II Mill I II I III I 
GAGTGATGAGGACAGCTGTGTAGCCTGTG-GATCCTTGAGGACAGCAGGTCCCCAGGCAG 578 

GGG-ACCAGACCGGAGGA6TGGGGCCTGAAGCAGCTCATCCTGCATGGAGC-CTACACCC 1177 

I I III II II III II III II I 11*111 II I 

GAGCACCCTCCCCATGGCCCTGGGAGGCCAG--GCTGATGCACCAGGGACAGCTGGCCTG 636 

ACCCTGAGGGGGGCTA - C - GACATGGCCCTCCTG - CTGCTGGCCCAGCCTGTGACACTGG 1234 

MM I I III III I II llllll III! Ill II III 

TGGCGGAGCCCTGGTGTCAGAGGAGGCGGTGCTAACTGCTG-CCCA- -CTGCTTCATTGG 693 

GAGCCAGCCTGCGGCCCCTCTGCCTGCCCTATG - CTGACCACCAC - CTGCCTGA - TGGGG 1291 

I Mill III II M III III MM I I II lllll 

GCGCCAGGCCCC7^GAGGAA-TGGA-GCG-TAGG6CTGGGGACCAGACCGGAGGAGTGGGG 750 
AGCGTGGCTGGGT - TCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCC - CTCCAGAC 1349 

I II I M I I II II I I M I I II III Ml 

CCTGAAGCAGCTCATCCTGCATGGAGCCTAC-ACC- -CACCC-TGAGGGGGGCTAC-GAC 805 

AGTGCCCGTGACCCTCCTGGG GCCTAGGGC- CTGC-AGCCGGC-TGCATGCAGCTCC 1403 

I MM I II MM MM I I III MM II III II III 

ATGGCCC - TCCTGCTGCTGGCCCAGCCTGTGACACTGGGAGCCAGCCTGCG - GCCCCTC - 862 

TGGGGGTGATG - GCAG - CC - CTATTCTGCCGGGGATGGTGTGTACCAGTGCTGTGGGT - G 1459 

II I I M I II M lllll I II I II I I I I MM I 

TGCCTGCCCTATGCTGACCACCAC - CTGCCTGATGGGGAGCGTGGCTGGGTTCTGGGACG 921 
AGCT - GCCCAGCTGTGAGGCCAACCAACCAGCTGCTGACAGGGGACCTGGCCATTCTCAG 1518 

II llllll I MM I II I III lllll I I II II I II I 

GGCCCGCCCAGGAGC-AGGC- -ATCAGCTCCCTCCAGACAGTGCCCGTGACCCTCCTGGG 978 



lllll lllll I M 



Expect 



1.9e-23, P = 1.9e-23 



CTGGGAGGCCAGGCTGATGCAC-CAGGGACAGCTGGCCTGTGGCGGAGC- - CCTGGTGTC 1046 

III I MM III II II i MM II lllll III I MM I 

CTGCTGGCCCAGCCTG-TG-ACACTGGGA- -GCCAGCCTGCGGCCCCTCTGCCTGCCCTA 873 

AGAGGAGGCGGTGCTAACTGCTGCCCA - C - TG - CTTCATTGGGCGCCAGGCCC - CAGAGG 1102 

I II II III II I I II II Mill lllll I III 

TGCTGACCACCACCTGCCTGATGGGGAGCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGG 933 



1160 



I I II I I II llllll I III III I II I III 

AGCAG-GCATCAG-CTCCCT-CCAGACAGTGCCCGTGAC-CCTCCTGGGGCCTAGGGCCT 



989 



GCATGGAGCCTACACCCACCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCCA 1220 

III I Ml II II II lllll I I II Mill I MM II 

GCA - GCCGGCTGCATGCAGC - TCCTGGGGGTGATGGCA - - GCCCTATT - CTGCCGGGGAT 1044 

GCCTGTG - ACACTGGGA - GCCAGCCTGCGGCCCCTCTGCCTGC - CCTATGCTGAC - CACC 1276 

I MM llllll I M MM III I III I III MM 

GG - TGTGTAC - CAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGGGCCTGT - CTGGGGCACC 1101 

ACC- -TGCCTGATGGGGAGCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCAT 1334 

II III III II Ml I I Mill M Mill M III I I II 



- CA - GCTCCCTCCA - GACAGTGCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCT 1391 

III II I III MM I I M I II MM I I M 



50 



-.1 .i H.^y H- £" to - O fT. ..si. a Ji »:;■•! 



I I nil III II I III III III UN III I I I III 

GGGT-CAGCAGTTTGGACTG- -G-CAGGTCTACTTC-GCCGAGGAACCAGAGCCCGAG-G 1271 

CTGTGGGTG-A-GCTGCCCAGCTGTGAG- -GCCAACCAACCAGCTGCTGACAGGGGACCT 1506 

III I II I mill u I iiiiiiiiiiiiiiNiiiiiiiiiiiiii 



10 I II nil 1 1 II II I iiiiiiiiiiiiniiiiiiniiiiiiiiiiiiiiiiiiiiiii 

GGCCATTCTCAGGA-CAAGAGAATGCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTCC 139 

CCACCCTGTCATGTGTGATTCCAGGC 1592 

llllllllllllllllllllllllll 
15 SbjCt: 1391 CCACCCTGTCATGTGTGATTCCAGGC 1416 

(72.2 bits). Expect = i.le-12, P = l.le-12 

409/666 (61%), Positives = 409/666 (61%), Strand «■ Plus / Plus 



20 



40 



45 



60 



65 



Query: 


1392 


Sbj Ct : 


1218 


Query: 


1451 


Sbjct : 


1272 


Query : 


1507 


Sbjct: 


1332 


Query: 


1567 


Sbjct : 


1391 


Score 


» 481 


Identities = 


Query: 


207 


Sbjct : 


584 


Query: 


267 


Sbjct : 


643 


Query: 


326 


Sbjct: 


702 


Query : 


386 


Sbjct : 


755 


Query : 


443 


Sbjct: 


813 


Query: 


491 


Sbjct : 


873 


Query: 


550 


Sbjct : 


931 


Query: 


610 


Sbj ct : 


984 


Query : 


669 


Sbj ct : 


1042 


Query: 


728 


Sbjct: 


1098 


Query: 


783 


Sbjct : 


1156 


Query: 


835 


Sbjct : 


1215 



MM I I MMMMI MMMI Ml I M Ml I III Mil 



25 Query: 267 CTCCCTGGTGGCAGACACCTGGGTCCTCACTGCTGCCCACTGCTTTGAAAAGGCAGCAG- 325 

IIIMIII MM Ml II IIMMIIIMMMII I Ml Ml 

AGCCCTGGTGTCAGAGGAGGCGGTGCTAACTGCTGCCCACTGCTTC - ATTGGGCGCCAGG 701 
CAACAGAACTGAATTCCTGCGTGAGGGACTCAGCCCCTGGGGCCGAAGAGGTGGGGGTGG 3 85 

30 ~ I MM MM MM MM M I III I M III Mill I 

CCCCAGAG- -GAATGGA-GCGT-AGGG-CTGGGGACCAGAC-CGGAGGAG-TGGGGCCTG 754 
CTGCC - CTGCAGT - TGCCCAGGGCCTATAACCACTAC - AGCCAGGGCTCAGACCTGGCCC 442 

M M M III I Mill I MM II Mill Ml IIIMI 

35 Sbjct: 755 AAGCAGCT-CATCCTGCATGGAGCCTACACCCACCCTGAGGG-GGGCTACGACATGGCCC 812 
TGCTGCAGCTCGCCCACCC CAC - - G - ACCCA - CA - - CA - CCCCTCTGCCTGCCCC 4 90 

I MM III Mill II III I I III I I lllllllllllllll 

TCCTGCTGCTGGCCCAGCCTGTGACACTGGGAGCCAGCCTGCGGCCCCTCTGCCTGCCCT 872 



AGCCCGCCCATCGCTTCCCCTTTGGAGCCTCCTG - CTGGGCCACTGGCTGGGATCAGGAC 549 

I II III! II II nil III lllll MM II I I I 

ATGCTGACCACCACCTGCCTGATGGGGAG- CGTGGCTGGGTT- CTGGGACGGGCCCGCCC 930 



II I II II MM III I I III I I M II II II I 

;AGC-AGGCATCAGCT-CCCT-C-CAGACAGTGCCCGTGACCC-TCC-TGGGGCCT-A 983 



TGTAACTGTATCTACAACCA-GCTGCACCAGCGACACCTGTCCAACCCGGCCCGGCCTGG 668 

SO ' I Mill II II II II I I II I I III I Ml II 

GGGC-CTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGC-AGCCCTATTCTGCCGGG 1041 
GATGCTATGTGGGGGCCCCCAGCCTGGG - GTGCAGGGCCCCTGTCAGGGAGATTCCGGGG 727 

MM I Ml III MM lllll MM MM M MM 

55 Sbjct: 1042 GATGGTGTGTACCAGTGCT--G--TGGGTGAGCTGCCCAGCTGTGAGGGCCTGTCTGGGG 1097 

- CC - CTG - TGC - TGTGCCTCGAGCCTGACGGACACTGGGTTCAGGCTGG - CATCATCAG 782 

II III III II I M I I II nil I lllll II I II I 



III I I III I III III I II II I I I III I II I I I 

lGATGCTTGCCAAGGCCCCGCC-AGGCCGGCGGTCTTCACCGCGCTCCCTGCCTATGAC 
;CTGC - TCA - CAGTTCCTGG - CTG- CAGGCTCGAGTTC 868 

III III inn III III Mil II I in 



51 



J. UJ / ^-^i -a SLJ r:" .il. .-S Irii 



Figure 10. BLASTP identity searcli for the protein of tlie invention. 



>patp:Y41704 Human PR0351 protein sequence - Homo sapiens, 571 aa . 

NO: 65) 



(SEQ ID 



10 



Length - 571 

Plus Strand HSPs: 

Score = 2544 (895.5 bits), Expect = l.le-263, P = l.le-263 
Identities = 476/493 (96%), Positives = 479/493 (97%), Frame = +1 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



Query: 


19 


SbjCt : 


1 


Query : 


199 


Sb j ct : 


60 


Query: 


358 


Sb j ct : 


120 


Query: 


535 


Sbj ct : 


180 


Query: 


715 


Sbj ct : 


240 


Query: 


895 


Sbjct : 


300 


Query: 


1075 


Sbjct: 


360 


Query: 


1255 


Sbjct: 


420 


Query: 


1435 


Sbj ct : 


480 


Score 


= 324 


Identities = 


Query : 


187 


Sbjct: 


322 


Query: 


367 


Sbjct: 


379 


Query : 


535 


Sbj ct : 


438 


Query: 


709 


Sbjct : 


492 



MLLSSLVSLAGSVYLAWILFFVLYDFCIVCITTYAINVSLMWLSFRKVQEPQGQPKPQEG 198 

lllllllllllllllllllllllllllllllllllllllllllllllllllll^ I ^ I 
MLLSSLVSLAGSVYLAWILFFVLYDFCIVCITTYAINVSLMWLSFRKVQEPQGKAK-RHG 59 

NTVPGEWPWQASVRRQGAHICSGSLVADTWVLTAAHCFEKAAATELNS- -CVRDS 357 

IIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIMIIIIIIIIII I I 
NTVPGEWPWQASVRRQGAHI CSGSLVADTWVIiTAAHCFEKAAATELNSWSWLGSLQREG 119 

-APGAEEVGVAALQLPRAYNHYSQGSDIiALLQLAHPTTHTPLCLPQPAHRFPFGASCWAT 534 

^llllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
LSPGAEEVGVAAIiQLPRAYNHYSQGSDLALLQIiAHPTTHTPLCLPQPAHRFPFGASCWAT 179 

GWDQDTSDAPGTLRNLRLRLISRPTCNCIYNQLHQRHLSNPARPGMLCGGPQPGVQGPCQ 714 

IIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 
GWDQDTSDAPGTLRNLRLRLISRPTCNCIYNQLHQRHIiSNPARPGMLCGGPQPGVQGPCQ 239 

GDSGGPVLCLEPDGHWVQAGIISFASSCAQEDAPVLLTNTAAHSSWLQARVQGAAFLAQS 894 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
GDSGGPVLCLEPDGHWVQAGIISFASSCAQEDAPVLLTNTAAHSSWLQARVQGAAFLAQS 299 

PETPEMSDEDSCVACGSLRTAGPQAGAPSPWPWEARLMHQGQLACGGALVSEEAVLTAAH 1074 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
PETPEMSDEDSCVACGSLRTAGPQAGAPSPWPWEARLMHQGQLACGGALVS EEAVLTAAH 359 

CFIGRQAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRPL 1254 

IIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CFIGRQAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALLLIiAQPVTLGASLRPIi 419 

CLPYADHHLPDGERGWVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILPG 1434 

INI IIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CLPYPDHHLPDGERGWVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILPG 479 



llllllllllllll 



(114.1 bits). Expect = 7.0e-26, P = 7.0e-26 
= 91/250 (36%), Positives = 123/250 (49%), Frame = +1 

PQEGNTVPGEWPWQASVRRQGAHI CSGSLVADTWVLTAAHCFEKAAATELNSCVRDSAPG 366 

II I I llhl ^ II I IHh^ llllllll III +1 

PQAG- -APSPWPWEARLMHQGQIiACtSGALVSEEAVLTAMCFIGRQAPEEWSVGLGTRP- 378 

AEEVGVAALQLPRAYNHYSQGSDLALLQLAHPTTH TPLCLPQPAHRFPFGASCWAT 534 

II h I I II I I kill II I I Mill II I I I 

- EEWGLKQL I LHGAYTHPEGGYDMALLLLAQPVTLGASLRPLCLPYPDHHLPDGERGWVL 437 



708 



GRARPGAGI 



+ I + + + I + I + 

- SSLQTVPVTLLGPRACS 



HI + I lll+l I 
RLHAAPGGDGSPILPGMVCTSAV-GELPS 



CQGDSGGPVLCLEPDGHWVQAGIISFASSCAQEDAPVLLTNTAAHSSWIiQARVQGAAFLA 

hi II I I I II II +1 I + I h h + + I 

CEGLSGAP - LVHEVRGTWFLAGLHSFGDACQGPARPAVFTALPAYEDWVSS - LDWQVYFA 



491 



888 



549 



52 



Query: 889 QSPETPEMSDEDSCVA 936 

^ II II ^+ ll+l 
Sbjct: 550 EEPE-PE-AEPGSCLA 563 

5 >patp:Y9029l Human peptidase, HPEP-8 protein sequence - Homo sapiens, 267 aa . 

(SEQ ID NO: 66) 

Length =■ 267 
10 Plus Strand HSPs : 

Score = 1028 (361.9 bits), Expect = 5.0e-103, P « 5.0e-103 
Identities = 189/189 (100%), Positives =» 189/189 (100%). Frame = +1 

15 Query: 910 MSDEDSCVACGSLRTAGPQAGAPSPWPWEARLMHQGQLACGGALVSEEAVLTAAHCFIGR 1089 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIMIIII 

Sbjct: 1 MSDEDSCVACGSLRTAGPQAGAPSPWPWEARLMHQGQLACGGALVSEEAVLTAAHCFIGR 60 

Query: 1090 QAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRPLCLPYA 1269 
20 II I II I I I I I I I II II II II I I I II I I I I I II II I I II I II I M I I I I I II I II I I I I II 

Sbjct: 61 QAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRPLCLPYA 120 

Query: 1270 DHHLPDGERGWVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILPGMVCTS 1449 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIMIIM 

25 Sbjct: 121 DHHLPDGERGWVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILPGMVCTS 180 



30 



45 



60 
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Query: 1450 AVGELPSCE 1476 

lllllllll 
Sbjct: 181 AVGELPSCE 189 

Score - 316 (111,2 bits). Expect = 4.2e-27, P = 4.2e-27 
Identities 90/250 (36%), Positives = 122/250 (48%), Frame = +1 



Query: 187 PQEGNTVPGEWPWQASVRRQGAHICSGSLVADTWVLTAAHCFEKAAATELNSCVRDSAPG 366 

35 II I I IIIH ^ II I hll^+ llllllll III +1 

Sbjct: 18 PQAG--APSPWPWEARLMHQGQLACGGALVSEEAVLTAAHCFIGRQAPEEWSVGLGTRP- 74 

Query: 367 AEEVGVAALQLPRAYNHYSQGSDLALLQLAHPTTH TPLCLPQPAHRFPFGASCWAT 534 

II 1^ I I II I I kill II I I Hill III I 

40 Sbjct: 75 -EEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRPLCLPYADHHLPDGERGWVL 133 

Query: 535 GWDQDTSDAPGTLRNLRLRLISRPTCNCIYNQLHQRHLSN- -PARPGMLCGGPQPGVQGP 708 

I 4- + +1+ + 4- 1+ +11 I I I l + l I 

Sbjct: 134 GRARPGAGI-SSLQTVPVTLLGPRACS RLHAAPGGDGSPILPGMVCTSAV-GELPS 187 



Query- 709 CQGDSGGPVLCLEPDGHWVQAGIISFASSCAQEDAPVLLTNTAAHSSWLQARVQGAAFLA 888 

hi I I I I I I I I h I I ^1 I + I 1^ 1+ + + +1 

Sbjct: 188 CEGLSGAP-LVHEVRGTWFLAGLHSFGDACQGPARPAVFTALPAYEDWVSS-LDWQVYPA 245 



50 Query: 889 QSPETPEMSDEDSCVA 936 

> II II +^ 11^1 
Sbjct: 246 EEPE-PE-AEPGSCLA 259 

Table 11. BLASTN identity search (versus the human SeqCalling database for the 
55 Peptidase-like protein of the invention. 

>s3aq: 153687026 Category D: 377 frag (6 5'sig-CG, 204 non-5 ' sig-CG, 167 non-CG (SEQ ID 

NO: 67) 

EST) , 1114 bp. 
Length = 1114 



Minus Strand HSPs: 
Score = 894 (134.1 bits). Expect = 3.1e-35, P = 3.1e-35 

Identities = 182/186 (97%), Positives ^ 182/186 (97%), Strand = Minus / Plus 
Query: 186 CTTGGGTTGGCCCTGGGGTTCTTGGACCTTCCGGAAACTGAGCCACATCAGGCTCACGTT 127 
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20 



25 



60 



Sbjct : 


413 


Query: 


126 


Sbjct : 


473 


Query : 


66 


Sbjct : 


533 


Query: 


6 


Sbjct: 


593 



III I I lllllllllllllllllllllllllllllllllllllllllllllllllll 

CTTAGCCTTGCCCTGGGGTTCTTGGACCTTCCGGAAACTGAGCCACATCAGGCTCACGTT 472 

GATAGCATAGGTGGTGATACAAACAATGCAGAAATCATAGAGCTVCGAAGAACAGGATCCA 6 7 

lllllllllillllllllllllllllllllllllllllllllllllllllllllllllll 
GATAGCATAGGTGGTGATACAAACAATGCAGAAATCATAGAGCACGAAGAACAGGATCCA 532 

GGCCAGGTAGACAGJ\ACCAGCGAGAGACACCAGGGAGCTCAGCAGCATCAGGACAGAGGC 7 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
10 Sbjct: 533 GGCCAGGTAGACAGAACCAGCGAGAGACACCAGGGAGCTCAGCAGCATCAGGACAGAGGC 592 

CCAGCG 1 
llllll 

__ _ CCAGCG 598 

15 

>s3aq: 152507187 17 frag (1 5*sig-CG, 7 non-5 ' sig-CG, 9 non-CG EST), 588 bp. (SEQ ID 
NO:68} 

Length « 588 
Plus Strand HSPs: 
Score = 882 (132.3 bits). Expect = 2.1e-34, P = 2.1e-34 

Identities = 178/180 (98%), Positives = 178/180 (98%), Strand = Plus / Plus 

CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGGTTCTGTCTAC 6 0 

lllllllllllllllllllllllllllllllllllllllllllllllll llllllllll 

CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGTTTCTGTCTAC 426 
30 Query: 61 CTGGCCTGGATCCTGTTCTTCGTGCTCTATGATTTCTGCATTGTTTGTATCACCACCTAT 120 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIMIIMIIIIIIIIIIIIII 

CTGGCCT(3GATCCTGTTCTTCGTGCTCTATGATTTCTGCATTGTTTGTATCACCACCTAT 486 
GCTATCAACGTGAGCCTGATGTGGCTCAGTTTCCGGAAGGTCCAAGAACCCCAGGGCCAA 180 

35 " ' 1 1 1 1 1 1 M 1 1 1 II I 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II I 1 1 M I 1 1 II 1 1 II I I II I II I 

GCTATCAACGTGAGCCTGATGTGGCTCAGTTTCCGGAAGGTCCAAGAACCCCAGGGGCAA 546 

>s3aq:153485867 Category D: 3 frag (1 non-5 • sig-CG, 2 non-CG EST), 612 bp. (SEQ ID 
40 NO:69) 

Length = 612 

Plus Strand HSPs: 

45 Score = 785 (117,8 bits). Expect = 1.7e-29, P = 1.7e-29 

Identities = 157/157 (100%), Positives = 157/157 (100%), Strand = Plus / Plus 

CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGGTTCTGTCTAC 6 0 

50 " ' I I 1 1 1 II II I I I 1 1 I II 1 1 I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I I I I I I I 1 1 I I II I 1 1 I I II I I I 

CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGGTTCTGTCTAC 515 

CTGGCCTGGATCCTGTTCTTCGTGCTCTATGATTTCTGCATTGTTTGTATCACCACCTAT 120 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIII 

55 Sbjct: 516 CTGGCCTGGATCCTGTTCTTCGTGCTCTATGATTTCTGCATTGTTTGTATCACCTVCCTAT 575 

GCTATCAACGTGAGCCTGATGTGGCTCAGTTTCCGGA 157 

lllllllllllilllllllllllllllllllllllll 

GCTATCAACGTGAGCCTGATGTGGCTCAGTTTCCGGA 612 



Query: 


1 


Sbjct: 


367 


Query: 


61 


Sb j ct : 


427 


Query : 


121 


Sbjct: 


487 



Query: 


1 


Sb j ct : 


456 


Query : 


61 


Sb j ct : 


516 


Query: 


121 


Sbjct: 


576 



>s3aq: 153485864 Category D: 2 frag (2 non-5 • sig-CG) , 425 bp. {SEQ id nO:70) 
Length « 425 



65 Plus Strand HSPs: 

Score = 785 (117.8 bits). Expect = 2.4e-29, P = 2.4e-29 

Identities = 157/157 (100%), Positives = 157/157 (100%), Strand = Plus / Plus 
70 Query: 1 CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGGTTCTGTCTAC 60 
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Sbjct : 
Query: 
Sbjct: 
Query: 

10 Sbjct: 



269 



IIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIMIIIII 

CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGGTTCTGTCTAC 328 



6 1 CTGGCCTGGATCCTGTTCTTCGTGCTCTATGATTTCTGCATTGTTTGTATCACCACCTAT 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIN 

329 CTGGCCTGGATCCTGTTCTTCGTGCTCTATGATTTCTGCATTGTTTGTATCACCACCTAT 



120 



388 



121 



157 



GCTATCAACGTGAGCCTGATGTGGCTCAGTTTCCGGA 

lllllllllllllllllllllllllllllllllllll 
389 GCTATCAACGTGAGCCTGATGTGGCTCAGTTTCCGGA 425 



Table 12. ClustalW alignment of the protein of the invention. 



CO50817-05 

741704 

Y90291 



[vILLSSLVSLAOSVYLAWILFFV LYDFCI V C I T T YA I N V SLMWLS F R KV QEPi^Gj 
MLLSSLVSLAGSVYLAWILFFV LYDFCI V C I T T YA I H V SLMWLS F R KV QEPQGS 



PKPQ E 
- AKR H 



CG50817.05 1 
Y41704 1 


N 1 
U 1 


r VPG EWP 
r VPG EWP 


WQASVRRQGAH 1 
WOASVRRQGAH ] 


[CSGSLVADTV 
[CSGSLVADTV 


VV LT AAHCF 
VV LT AAHCF 


EKAA ATE LNl 
EKAA ATE LhS 


Y90291 















WSVVLO 



CG^l7-05 SA 
Y41704 L S 
Y90291 



CO50817>05 

Y41704 

Y90291 

CO50817.05 
Y41704 

Y90291 

CG508 17-05 

y41704 

Y90291 

casosn-os 

Y41704 
Y90291 

CG508I7-05 

Y41704 

Y90291 



PGAE EVGVAALQLP R A YNH Y SQGSDLA L LQLAHPTTHTP L C L P Q P A H RF P FGASCWA T 
PGA E EVO VAALOL P R A YNH Y SQGSDLA L LQLAHPTTHTP L C L P Q PA H RFP FGASCWA T 



GWDQDT SDAPGTLRN L RLRL I SRPTCNC I Y N Q L HQRH L SN P A RP GM LC G GPQ PGVQG P C Q 
GWDQDT SDAPGTLRN L RLRL I SRPTCNC I Y NQ L HQRHLSN P A RPGMLCGGPQPGVQG P CQ 



GDSGG P V LC LEPDGHWVQAGI I SFASSCA Q EDA P VLLTNT A AHSS WLQARVQGAAF LA QS 
GDSGGPVLCLE PDGHWVQAGI I SFASSCAQ EDA P VLLTNT A AHSS WLQ A RVQGAAF LA QS 



PETPEMSDEDSCVACO S LRTAG PQAGAP S P WP WEARLMHQ G Q L AC GGA L VSE EAV LT A A H 
P ETP EMSD EDSCVACG S LRTAG PQAGAP S P WPWEARLMHQOQ LACGOA LVSEEAVLT A AH 
SDEDSCVACG S L RT A G PQAGAP S P WP WE ARLMHQ G Q L AC G GA L VS E EA V LT A A H 



C F IGRQAP EEWSVGLGTRPEEWGLKQLI L H G A Y THPEGGYDMALL L LAQPVTLGAS LR PL 
C F lORQAP EEWSVGLGTRPEEWGLKQLI L H GAY THPEGGYDMALL L LAQPVTLGAS LR PL 
C F IGRO AP EEWSVGLC T RPEEWGLKQLI L HGAY TH PEGGYDMALL L LAQ PVTLGAS LR P L 



C LPYADHH LPDGERGWV LORARPOAGISS LQTV P VT L LGP R A C S R L HA A PGGDG3 P I L PG 
C LPYHDHH LPDGERGWV LGRARPGAGISS LQTV P VTLLGP R A CS R L HAAPGGDGS P I L PG 
C LPYADHH LPDGERGWV LGRARPGAGISS L QT V P VTLLGP R A CS R LHAAPGGDGS P I L PG 



15 



KSSImS^ ^^J <^ A V n F. T. P scF»l^JgPlJACTg^^»pr;^^^^^^^^^^^^^^^^^^^^MP :t; 

MVCTS A VGELPSCEGL SGAPLVHEVRGTWF L AG LHSF GDAC Q GP A R PA V FTALP A YED W\' 
wIV CT S A V GEL P SCEGL S GAP L V HE VRGTWF L AG LHSF GDAC Q GP A R PA V FTALP A YEP W^-; 



CO50817-05 E NAOR BImPaLL P LSS 
Y41704 
Y90291 




Information for the ClustalW proteins: 



Accno Common Name Length 

CG508 1 7-05 novel Peptidase-like protein 

Y4 1 704 Human PR035 1 protein sequence. 571 

Y90291 Human peptidase, HPEP-8 protein sequence. 267 
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In the alignment shown above, black outlined amino acid residues indicate regions of 
conserved sequence (i.e., regions that may be required to preserve structural or functional 
properties); greyed amino acid residues can be mutated to a residue with comparable steric 
and/or chemical properties without altering protein structure or function (e.g. L to V, I, or M); 
5 non-highlighted amino acid residues can potentially be mutated to a much broader extent without 
altering structure or function. 



Table 13. Psort, Signal P and hydropathy results for CG50817-05 



10 plasma membrane Certainty=0 . 6850 (Affirmative) 

endoplasmic reticulum (membrane) Certainty=0 . 64 00 (Affirmative) 

Golgi body Certainty=0. 3700 (Affirmative) 

microbody (peroxisome) Certainty=0 . 1187 (Affirmative) 



suco 

suco 
suco 
suco 



15 



INTEGRAL Likelihood 



-8.44 Transmembrane 15 



31 (1 - 38) 



20 



25 



Seems to be a Type II (Ncyt Cexo) membrane protein 
Is the sequence a signal peptide? 
# Measure Position Value Cutoff Conclusion 



max. 


C 


36 


0 


688 


0 


37 


YES 


max . 


Y 


36 


0 


555 


0 


34 


YES 


max - 


S 


10 


0 


991 


0 


.88 


YES 


mean 


S 


1-35 


0 


.875 


0 


.48 


YES 



# Most likely cleavage site between pos. 35 and 36: TYA-IN 

Hydropa-thu Plot plot for CG3e817-89 with a window of 19 




-0.3 



300 

Amino Acid Number 
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SECP13 



A SECP13 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:46) and encoded polypeptide sequence (SEQ ID NO:47) of clone 
5 CG508 17-06 directed toward novel peptidase (HPEP-8)-like proteins and nucleic acids encoding 
them. This is a related variant of SECPl 1 and SECP12, clones CG50817-04 and CG50817-05. 
Figure 18 illustrates the nucleic acid sequence and amino acid sequences respectively. This 
clone includes a nucleotide sequence (SEQ ID NO:46) of 1200 bp. The nucleotide sequence 
includes an open reading frame (ORF) beginning with an ATG initiation codon at nucleotides 
10 33-35 and ending with a TGA codon at nucleotides 945-947. Putative untranslated regions, if 
any, are found upstream from the initiation codon and downstream from the termination codon. 
The encoded protein having 304 amino acid residues is presented using the one-letter code in 
Figure 18. 

The protein encoded by clone CG5081 7-06 is predicted by the PSORT program to the 
15 cytoplasm with a certainty of 0.4500, and does not appear to be a signal protein (see Table 1 8 
below). 

The DNA sequence and protein sequence for a novel Peptidase-like gene or one of its 
splice forms thus derived is reported here as the invention CG508 17-06. The Genomic clones 
having regions with 100% identity to the extended sequence thus obtained were identified by 
20 BLASTN searches with the extended sequence against human genomic databases. The genomic 
clone was selected for further analysis because this identity indicates that these clones contain 
the genomic locus for these SeqCalling assemblies. 

The regions defined by all approaches were then manually integrated and manually 
corrected for apparent inconsistencies that may have arisen, for example, from miscalled bases in 
25 the original fragments used, or from discrepancies between predicted homolgy to a protein of 
similarity to derive the final sequence of the invention CG508 17-06 reported here. When 
necessary, the process to identify and analyze SeqCalling assemblies, ESTs and genomic clones 
was reiterated to derive the full length sequence. 
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Similarities 

In a search of sequence databases, it was found, for example, that the nucleic acid 
sequence of this invention has 840 of 842 bases (99%) identical to a gb:z34002 Human PR0351 
nucleotide sequence from Homo sapiens (Tables 14 and 16). The full amino acid sequence of 
5 the protein of the invention was found to have 278 of 279 amino acid residues (99%) identical to, 
and 278 of 279 amino acid residues (99%) similar to, the 571 amino acid residue Y41704 Human 
PR0351 protein from Homo sapiens (Table 15). 

A multiple sequence alignment is given in Table 17, with the protein of the invention 
being shown on the first line in a ClustalW analysis comparing the protein of the invention with 
1 0 related protein sequences. 

The presence of identifiable domains in the protein disclosed herein was determined by 
searches using algorithms such as PROSITE, Blocks, Pfam, ProDomain, Prints and then 
determining the Interpro number by crossing the domain match (or numbers) using the Interpro 
website. The results indicate that this protein contains the following protein domains (as defined 
15 by Interpro) at the indicated positions: domain name trypsin at amino acid positions 1 to 62, 
domain name trypsin at amino acid positions 95 to 259. This indicates that the sequence of the 
invention has properties similar to those of other proteins known to contain this/these domain(s) 
and similar to the properties of these domains. 

Chromosomal information: 

20 The Peptidase disclosed in this invention maps to chromosome 16. This information was 

assigned using OMIM, the electronic northern bioinformatic tool implemented by CuraGen 
Corporation, public ESTs, public literature references and/or genomic clone homologies. This 
was executed to derive the chromosomal mapping of the SeqCalling assemblies. Genomic 
clones, literature references and/or EST sequences that were included in the invention. 

25 Tissue expression 

The Peptidase disclosed in this invention is expressed in at least the following tissues: 

Adrenal gland, bone marrow, brain - amygdala, brain - cerebellum, brain - hippocampus, brain - 

substantia nigra, brain - thalamus, brain -whole, fetal brain, fetal kidney, fetal liver, fetal lung, 

heart, kidney, lymphoma - Raji, mammary gland, pancreas, pituitary gland, placenta, prostate, 

30 salivary gland, skeletal muscle, small intestine, spinal cord, spleen, stomach, testis, thyroid, 
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trachea, uterus. This information was derived by determining the tissue sources of the sequences 
that were included in the invention including but not limited to SeqCalling sources. Public EST 
sources, and/or RACE sources. 

Cellular Localization and Sorting 

5 The SignalP, Psort and/or Hydropathy profile for the Peptidase-like protein are shown in 

Table 1 8. The results predict that this sequence has no signal peptide and is likely to be localized 
in the cytoplasm with a certainty of 0.4500 predicted by PSORT. 

Functional Variants and Homologs 

The novel nucleic acid of the invention encoding a Peptidase-like protein includes the 
10 nucleic acid whose sequence is provided in Figure 1 8, or a fragment thereof The invention also 
includes a mutant or variant nucleic acid any of whose bases may be changed from the 
corresponding base shown in Figure 18 while still encoding a protein that maintains its 
Peptidase-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 
15 described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of non-limiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least in 
20 part to enhance the chemical stability of the modified nucleic acid, such that they may be used, 
for example, as antisense binding nucleic acids in therapeutic applications in a subject. In the 
mutant or variant nucleic acids, and their complements, up to about 1% of the residues may be so 
changed. 

The novel protein of the invention includes the Peptidase-like protein whose sequence is 
25 provided in Figure 1 8. The invention also includes a mutant or variant protein any of whose 

residues may be changed firom the corresponding residue shown in Figure 18 while still encoding 
a protein that maintains its Peptidase-like activities and physiological functions, or a fiinctional 
fragment thereof. In the mutant or variant protein, up to about 1% of the bases may be so 
changed. 
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Antibodies 



The invention further encompasses antibodies and antibody fragments, such as Fab, 
(Fab)2 or single chain FV constructs, that bind immunospecifically to any of the proteins of the 
invention. Also encompassed within the invention are peptides and polypeptides comprising 
5 sequences having high binding affinity for any of the proteins of the invention, including such 
peptides and polypeptides that are fused to any carrier partclc (or biologically expressed on the 
surface of a carrier) such as a bacteriophage particle. 

Uses of the Compositions of the Invention 

The protein similarity information, expression pattern, and map location for the 
10 Peptidase-like protein and nucleic acid disclosed herein suggest that this Peptidase may have 
important structural and/or physiological functions characteristic of the Serine protease family. 
Therefore, the nucleic acids and proteins of the invention are useful in potential diagnostic and 
therapeutic applications and as a research tool. These include serving as a specific or selective 
nucleic acid or protein diagnostic and/or prognostic marker, wherein the presence or amount of 
1 5 the nucleic acid or the protein are to be assessed, as well as potential therapeutic applications 

such as the following: (i) a protein therapeutic, (ii) a small molecule drug target, (iii) an antibody 
target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), (iv) a nucleic acid useful in 
gene therapy (gene delivery/gene ablation), and (v) a composition promoting tissue regeneration 
in vitro and in vivo (vi) biological defense weapon. 

20 The nucleic acids and proteins of the invention are useful in potential diagnostic and 

therapeutic applications implicated in various diseases and disorders described below and/or 
other pathologies. For example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from: cell proliferative disorder; arteriosclerosis; psoriasis; 
myelofibrosis; cancer; autoimmune disorder; Crohn's disease; inflammatory disorder; AIDS; 

25 anaemia; allergy; asthma; atherosclerosis; Grave's disease; multiple sclerosis; scleroderma; 
infection; diabetes; metabolic disorder; Addison*s disease; cystic fibrosis; glycogen storage 
disease; obesity; nutritional edema, hypoproteinemia and other diseases, disorders and conditions 
of the like. 

These materials are further useful in the generation of antibodies that bind 
30 immunospecifically to the novel substances of the invention for use in therapeutic or diagnostic 
methods. 
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Table 14. BLASTN identity searcli for the nucleic acid of the invention. 

>patn:Z34002 Human PR0351 nucleotide sequence - Homo sapiens, 2365 bp. (SEQ ID 

N0:7I) 

Length =• 2365 
Plus Strand HSPs: 
Score = 4192 (629.0 bits). Expect = 1.9e-184, P « 1.9e-184 

Identities = 840/842 (99%) , Positives = 840/842 (99%) , Strand = Plus / Plus 

AGCGACACCTGTCCAACCCGGCCCGGCCTGGGATGCTATGTGGGGGCCCCCAGCCTGGGG 6 0 

IIIIIIIIIMIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

AGCGACACCTGTCCAACCCGGCCCGGCCTGGGATGCTATGTGGGGGCCCCCAGCCTGGGG 995 



15 Query: 61 TGCAGGGCCCCTGTCAGGGAGATTCCG(3GGGCCCTGTGCTGTGCCTCGAGCCTGACGGAC 120 

lllllllllllllllllllillllllllllllllllllllllllllllllllllllllll 
TGCAGGGCCCCTGTCAGGGAGATTCC6GGGGCCCTGTGCTGTGCCTCGAGCCTGACGGAC 1055 

ACTGGGTTCAGGCTGGCT^TCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTG 180 

20 ' I I I I I 1 1 1 1 1 1 1 1 1 I I I 1 1 I I I I I I 1 1 1 1 1 1 1 1 M 1 1 1 1 1 I 1 1 I I I 1 1 I 1 1 I 1 1 I I I I I I 

ACTGGGTTCAGGCTGGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTG 1115 
TGCTGCTGACCAACACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGGGGCAG 24 0 

IIIIIIIIIIIIIIIIMIMIIIIIIIIIIIIIIIIIIIIIIIIIIIMIMIIIIIM 

25 Sbjct: 1116 TGCTGCTGACCAACACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGGGGCAG 1175 
CTTTCCTGGCCCAGAGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCT 3 00 

IIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CTTTCCTGGCCCAGAGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCT 1235 



Query : 


1 


Sbjct : 


936 


Query : 


61 


Sb j ct : 


996 


Query : 


121 


Sbj ct : 


1056 


Query : 


181 


Sbj ct : 


1116 


Query : 


241 


Sbjct : 


1176 


Query : 


301 


Sbjct : 


1236 


Que ry : 


361 


Sbjct : 


1296 


Que ry : 


421 


Sbjct : 


1356 


Query : 


481 


Sbjct : 


1416 


Query : 




Sbjct: 


1476 


Query : 


601 


Sbjct: 


1536 


Query: 


661 


Sbj ct : 


1596 


Query: 


721 


Sbjct: 


1656 


Query : 


781 


Sbj ct : 


1716 


Query: 


841 



GTGGATCCTTGAGGACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGG 3 60 

IIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

GTGGATCCTTGAGGACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGG 1295 



35 Query: 361 CCAGGCTGATGCACCAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGG 4 20 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIII 

CCAGGCTGATGCACCAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGG 1355 
TGCTAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGC 480 

40 ~ I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 I M 1 1 1 1 I 1 1 1 1 1 1 1 1 I 1 1 

TGCTAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGC 1415 
TGGGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTACACCC 540 

IIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

45 Sbjct: 1416 TGGGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTACACCC 1475 

ACCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACTGGGAG 600 
lllllllllllillllllllllllllllllllllllllllllllllllllllllllllll 



1535 

CCAGCCTGCGGCCCCTCTGCCTGCCCTATGCTGACCACCACCTGCCT(3ATGGGGAGCGTG 660 

lllllllilllllllllllllllllllll llllllllllllllllllllllllllllll 
CCAGCCTGCGGCCCCTCTGCCTGCCCTATCCTGACCACCACCTGCCTGATGGGGAGCGTG 1595 



55 Query: 661 GCTG<3GTTCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCG 720 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
GCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCATCTVGCTCCCTCCAGACAGTGCCCG 1655 

TGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCA 780 

60 ~ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 I 

TGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCA 1715 
GCCCTATTCTGCCGGGGATGGTGTGTACCAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGG 840 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

65 Sbjct: 1716 GCCCTATTCTGCCGGCSGATGGTGTGTACCAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGG 1775 
CC 842 

I 

61 



«Jl ilJ i li V'*' ^'4"i"^ & & «• L.ji s.p k-JI 



10 



25 



45 



65 



Sbjct: 1776 GC 1777 
Score « 1915 (287.3 bits). Expect = 1.4e-81, P = 1.4e-81 

Identities = 635/848 (74%), Positives = 635/848 (74%), Strand = Plus / Plus 
CTGGGAGGCCAGGCTGATGCAC- CAGGGACAGCTGGCCTGTGGCGGAGC - - CCTGG- -TG 4 07 

III I INI III II M I Mil II mil III I MM I 

CTGCTGGCCCAGCCTG-TG-ACACTGGGA- -GCCAGCCTGCGGCCCCTCTGCCTGCCCTA 1563 

TCA-GAGGAGGCGGTGC-TAACTGCTGCCCACTGCTTCTVTTGGGCGCCAGGCCC-CAGAG 464 
II III III I I II I I III II I I I Mill I II 



GAATGGAGCGTAGGGCTGGGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCAT - - CC 522 

15 ~ II I II I I II II I III I III III I II I II 

GAGCAG - GCATCAG - CTCCCT - CCAGACAGTGCCCGTGAC - CCTCCTGGGGCCTAGGGCC 1678 

TGCATGGAGCCTACACCCACCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCC 582 

MM I I II II II I I Mill I I II Mill I MM II 

20 Sbjct: 1679 TGCA-GCCGGCTGCATGCAGC-TCCTGGGGGTGATGGCA--GCCCTATT-CTGCCGGGGA 1733 

AGCCTGTG- ACACTGGGA-GCCAGCCTGCGGCCCCTCTGCCTGC-CCTATGCTGAC-CAC 63 8 

I MM II I I I I I II MM III I Ml I III III 



Query: 


353 


Sbj Ct : 


1508 


Query: 


408 


Sbjct: 


1564 


Query: 


465 


Sbj ct : 


1623 


Query: 


523 


Sbj ct : 


1679 


Query: 


583 


Sbjct: 


1734 


Query: 


639 


Sbjct: 


1791 


Query: 


697 


Sbj ct : 


1849 


Query: 


754 


Sbjct: 


1907 


Query : 


813 


Sbj ct : 


1961 


Query: 


869 


Sbj ct : 


2021 


Query : 


929 


Sbj ct : 


2080 


Query: 


989 


Sbj ct : 


2140 


Query: 


1049 


Sbj ct : 


2200 


Query: 


1109 


Sbj ct : 


2260 


Query: 


1169 


Sbj ct : 


2320 


Score 


= 267 



1790 

CACC - - TGCCTGATGGGGAGCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCA 696 

Ml III III II III I I Mill II Mill II III III 
CACTGGTGCATGA- GGTGAGGGGCACATGGTTCCTGGCCGGGCT-GCACAGCTTCGGAGA 1848 



30 Query: 697 T-CA-GCTCCCTCCA-GACAGTGCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGC 753 

I I II II I III III I I I II I M MM I I I 

TGCTTGCCAAGGCCCCGCCAG-GCCGGCGGTCTTCACCGCGCTCCCTGCCTAT-GAGGAC 1906 
TGCATGCAGCTCCTGGGGGTGATGGCAGCCCTA - TTCTGCCGGGGATGGTGTGTACCAGT 812 

35 ' II Mill I II II I III III III MM III II I II 

TGGGT-CAGCAGTTTGGACTG- -G-CAGGTCTACTTC-6CCGAGGAACCAGAGCCCGAG- 1960 

GCTGTGGGTG-A-GCTGCCCAGCTGTGAG- -GCCAACCAACCAGCTGCTGACAGGGGACC 868 

MM I II I llllll II I IIIIIMIIIIIIIIMIIIMIIIIIII 

40 Sbjct: 1961 GCTGAGCCTGGAAGCTGCCTGGCCAACATAAGCCAACCAACCAGCTGCTGACAGGGGACC 2020 

TGGCCATTCTCAGGAACAAGAGAATGCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTC 928 

IIIIIIIMIIIMI MIIIIIIIIIIIIIIIIIIMIIIIIIIMMMMIIIIIM 

TGGCCATTCTCAGGA - CAAGAGAATGCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTC 2079 



CCCACCCTGTCATGTGTGATTCCAGGCACCAGGGCAGGCCCAGAAGCCCAGCAGCTGTGG 988 

IIIIIIIIIIMIIIIIIIIIII MM IIIMIMIIMIIIIIIIIIIIIIIIIIIIII 

CCCACCCTGTCATGTGTGATTCCAGGCACCAGGGCAGGCCCAGAAGCCCAGCAGCTGTGG 2139 



50 Query: 989 GAAGGAACCTGCCTGGGGCCACAGGTGCCCACTCCCCACCCTGCAGGACAGGGGTGTCTG 1048 

lllllllllllllllllllllll MM llllllllllllllllllllllllllllll II I 

GAAGGAACCTGCCTGGGGCCACAGGTGCCCACTCCCC:ACCCTGC7^GGACAGGGGTGTCTG 2199 
TGGACACTCCCACACCCAACTCTGCTACCAAGCAGGCGTCTCAGCTTTCCTCCTCCTTTA 1108 

55 ~ I II II II II II II 1 1 II 1 1 1 1 II 1 1 1 1 II 1 1 1 1 II 1 1 1 II II I II II M 1 1 1 1 1 M I I II 

TGGACACTCCCACACCCAACTCTGCTACC:AAGCAGGCGTCTCAGCTTTCCTCCTCCTTTA 2 259 
CCCTTTCAGATACAATCACGCCAGCCACGTTGTTTTGAATyVTTTCTTTTTTTGGGGGGCA 1168 

I IIIMIIIIIIIIIIIIIIIIMIIIIMMMIIIIIMIIMMIIIIIIIIMII 

60 Sbjct: 2260 CTCTTTCAGATACAATCACGCCAGCCACGTTGTTTTGAAAATTTCTTTTTTTGGGGGGCA 2319 



IMIIMMIMIIIMIIIIMMIIIIMI 

GCAGTTTTCCTTTTTTTAAACTTAAATAAATT 2351 

(40.1 bits). Expect « 0.0078, P = 0.0078 
Identities = 349/598 (58%), Positives = 349/598 (58%), Strand « Plus / Plus 



Query: 275 GAGTGA-TGAGGACAGCTGTGTAGCCTGTGGATCCTTGAGGACAGCAGGTCCCCAGGCAG 333 

70 MM II I MM II III III MM I I I MM II II 

62 



Li. %J O ^'"3 €-s IS kJ .t;:'! .% 1,? 



10 



15 



20 



25 



30 



35 



40 



45 



Sbjct: 


424 


Query: 


334 


Sbj Ct : 


478 


Query: 


393 


Sbjct: 


536 


Query : 


452 


Sbjct: 


596 


Query: 


506 


Sbjct: 


652 


Query : 


563 


Sbjct: 


710 


Query: 


621 


Sbjct: 


763 


Query: 


678 


Sbjct: 


821 


Query: 


735 


Sbjct : 


877 


Query : 


791 


Sbjct : 


934 


Query: 


849 


Sbjct: 


989 


>patn : 


A37664 


NO: 72) 





GOSTGCCTGTGGACAGC-GTG- -GCCCC-GGCCCCCCCAAGCCT-CAGGAGGGCAA-CAC 477 

GAGCACCCTCCCCA - TGGCCCTGGGAGGCCAGGCTGATGCACCAGGGACAGCTGGCCTGT 392 

II INI I I llllllill lllllll III I II III I III 
- AGT - CCCTGGCGAGTGGCCCTGGCAGGCCAGTGTGAGGAGGCAAGGAGCCCACATCTGC 535 

GGCGGAGCCCTGGTGTCAGAGGAGGCGGTGCTAACTGCTGCCCACTGCTTC - ATTGGGCG 451 

nil iiiiiiii Mil III II iiiiiiiiiiiiiiiii I III 

AGCGGCTCCCTGGTGGCAGACACCTGGGTCCTCACTGCTGCCCACTGCTTTGAAAAGGCA 595 



III I nil INI 



II II II I III I II II III III 



II I II II 



II II I II 



III III 



I Mill I I I I I I II 



III III MM I I III 



III II II I llllll I 



I III 



III 



I II I 



I III I I 



- GCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGC - CC - GTGACCCTCCTGGGG 

II I III II II Mill II III I III II I llllll I I 
TGCTGGGCCACTGGCTGGGATCAGGACAC - - CAGTGA - TGCTCCTGGGACCCTAC - GCAA 



734 



876 



II I llllll II I II I II I I II 



II II 



I II II 



848 



GCCGGGGATGG - TGTGTA - CCAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGGCCAACCAA 

II I II III I II I I I I II I III I II I MM III 

- CCAGCGACACCTGTCCAACCCGGCCCG - GCCTGGGATGCTATG - TGGG - GGCCC - CCAG 988 



III I II lllll Mil I 



Length 
Plus Strand HSPs: 



1661 



50 



Score = 3831 <574.8 bits). Expect = 5.6e-168, P = 5.6e-168 
Identities = 767/768 (99%), Positives = 767/768 (99%), Strand 



Plus / Plus 



55 



60 



65 



70 



Query : 


75 


Sbj ct : 


320 


Query: 


135 


Sbj ct : 


380 


Query : 


195 


Sbjct: 


440 


Query : 


255 


Sbj ct : 


500 


Query: 


315 



CAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGACACTGGGTTCAGGCT 134 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIII llllllill 

CAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGACACTGGGTTCAGGCT 379 

GGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTGTGCTGCTGACCAAC 1 94 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
GGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTGTGCTGCTGACCAAC 439 

ACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGGGGCAGCTTTCCTGGCCCAG 254 

IIMIIIIIIMIIIIIIIIIIIIIIIIilllllllllllllllllllllllll llllll 

ACaGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGGGGCAGCTTTCCTGGCCCAG 499 

AGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCTGTGGATCCTTGAGG 3 14 

IIIIIIIIIMMIIIIIIIMIIIII llllll IIIIIMIIIIIMIIIIIIIIillll 

AGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCTGTGGATCCTTGAGG 559 



MIIIIIIIIIIIIIIIIMIIIIIIIIIIIIMI 

63 



llllllllllllllllllll 



374 



LL ? .9 ft..J "•■!!" tt" fc" rt S. > .1 „•;'>• ij 



Sbjct: 
Query: 



560 ACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGGCCAGGCTGATGCAC 619 



375 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



70 



Sbjct: 


620 


Query: 


435 


Sbjct: 


680 


Query: 


495 


Sbjct: 


740 


Query: 


555 


Sbjct: 


800 


Query: 


615 


Sbj Ct : 


860 


Query: 


675 


Sbjct: 


920 


Query: 


735 


Sbj Ct : 


980 


Query: 


795 


Sbj ct : 


1040 


Score 


= 193: 


Identities = 


Query: 


353 


Sbjct : 


818 


Query: 


410 


Sbjct: 


874 


Query : 


466 


Sbj ct : 


934 


Query: 


524 


Sbjct: 


990 


Query: 


584 


Sbjct: 


1045 


Query: 


640 


Sbjct: 


1102 


Query: 


698 


Sbj ct : 


1160 


Query: 


755 


Sbj ct : 


1218 



CAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGGTGCTAACTGCTGCC 434 

IIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIillllllllllllllllll 

CAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGGTGCTAACTGCTGCC 679 



CACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGCTGGGGACCAGACCG 494 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGCTGGGGACCAGACCG 739 

GAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTACACCCACCCTGAGGGGGGC 554 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

GAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTACACCCACCCTGAGGGGGGC 799 

TACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACTGGGAGCCAGCCTGCGGCCC 614 

IIIMIIIIIMIIIIIIIIIIIIIIIIMIIIIIMIIIIIIIIIIIIIIIIIIIMII 

TACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACTGGGAGCCAGCCTGCGGCCC 859 

CTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGGGAGCGTGGCTGGGTTCTGGGA 674 

IIIIIIIIIIIIIIIIIIIIIIMIillllllllllllMIIIIIIIIIIIIIIIIIIII 

CTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGGGAGCGTGGCTGGGTTCTGGGA 919 

CGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCGTGACCCTCCTGGGG 734 

IIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIillllM 

CGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCGTGACCCTCCTGGGG 979 

CCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCAGCCCTATTCTGCCG 7 94 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIMIII 

CCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCAGCCCTATTCTGCCG 1039 



IIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII I 



3xpect = 3.7e-82, P = 3.7e- 
Positives = 635/848 (74%) , 



82 
Strand 



Plus / Plus 



CTGGGAGGCCAGGCTGATGCAC - CAGGGACAGCTGGCCTGTGGCGGAGC - - CCTGGTGTC 409 

III I INI III II II I nil II Mill III I nil I 

CTGCTGGCCCAGCCTG- TG - ACACTGGGA- -GCCAGCCTGCGGCCCCTCTGCCTGCCCTA 873 
AGAGGAGGCGGTGCTAACTGCTGCCCA - C - TG - CTTCATTGGGCGCCAGGCCC - CAGAGG 465 

I II n III II I I II II II I I I inn i in 

TGCTGACCACCACCTGCCTGATGGGGAGCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGG 933 



523 



I I II I I II linn I III Ml I II I III 

AGCAG- GCATCAG - CTCCCT- CCAGACAGTGCCCGTGAC- CCTCCTGGGGCCTAGGGCCT 



989 



GCATGGAGCCTACACCCACCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCCA 583 

III I Mill II I I Mill I I II Mill I Mil II 

GCA - GCCGGCTGCATGCAGC - TCCTGGGGGTGATGGCA - - GCCCTATT - CTGCCGGGGAT 1044 
GCCTGTG - ACACTGGGA - GCCAGCCTGCGGCCCCTCTGCCTGC - CCTATGCTGAC - CACC 639 

I MM II I I I I I II MM III I III I III nil 

GG- TGTGTAC - CAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGGGCCTGT - CTGGGGCACC 1101 

ACC - - TGCCTGATGGGGAGCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCAT 697 

II III III II III I I iiiii II mil II III I I II 

ACTGGTGCATGA - GGTGAGGGGCACATGGTTCCTGGCCGGGCT - GCACAGCTTCGGAGAT 1159 

- CA - GCTCCCTCCA - GACAGTGCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCT 754 

III III III nil I i II I II nil I I M 

GCTTGCCAAGGCCCCGCCAG - GCCGGCGGTCTTCACCGCGCTCCCTGCCTAT - GAGGACT 1217 

GCATGCAGCTCCTGGGGGTGATGGCAGCCCTA - TTCTGCCGGGGATGGTGTGTACCAGTG 813 

I I INI I II 11 I III III III nil III I I I II I 

GGGT-CAGCAGTTTGGACTG- -G-CAGGTCTACTTC-GCCGAGGAACCAGAGCCCGAG-G 1271 

64 



A. iL 11 0 y ''•fT™? €'M •« ^1-'^ Jl tJ 





Query: 


814 




Sbjct : 


1272 


5 


Query: 


870 




Sb j Ct : 


1332 


10 


Query: 


930 




Sbjct: 


1391 




Query: 


990 


15 


Sbjct: 


1451 




Query : 


1050 


20 


Sb j ct : 


1511 




Query: 


1110 




Sbjct: 


1571 


25 


Query: 


1170 




Sbjct: 


1631 




Score 


= 559 


30 


Identities « 




Query: 


1 


35 


Sbjct: 


93 




Query: 


61 




Sbjct: 


153 


40 


Query: 


118 




Sbjct: 


211 


45 


Query: 


174 




Sbjct : 


269 




Query : 


227 


50 


Sbjct: 


325 




Query: 


284 


55 


Sbjct: 


381 




Query: 


339 




Sbjct : 


435 


60 


Query : 


397 




Sb j ct : 


492 


65 


Query: 


457 




Sb j ct : 


551 




Query : 


514 


70 


Sb j ct : 


610 



CTGTGGGTG-A-GCTGCCCAGCTGTGAG- -GCCAACCAACCAGCTGCTGACAGGGGACCT 869 

III 1 II I llllll II I llllllllllllillMIIIIIIIIIIIII 

CTGAGCCTGGAAGCTGCCTGGCCAACATAAGCCAACCAACCAGCTGCTGACAGGGGACCT 1331 

GGCCATTCTCAGGAACAAGAGAATGCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTCC 929 

llllllllllllll IIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

GGCCATTCTCAGGA - CAAGAGAATGCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTCC 1390 
CCACCCTGTCATGTGTGATTCCAGGCACCAGGGCAGGCCCAGAAGCCCAGCAGCTGTGGG 989 

IN lllllllllll IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIII 

CCACCCTGTCATGTGTGATTCCAGGCACCAGGGCTIGGCCCAGAAGCCCAGCAGCTGTGGG 1450 

AAGGAACCTGCCTGGGGCCACAGGTGCCCACTCCCCACCCTGCAGGACAGGGGTGTCTGT 1049 

I II lllllllllll lllll llllll lllllllllllllllllllllllllllllllllll 
AAGGAACCTGCCTGGGGCCACAGGTGCCCACTCCCCACCCTGCAGGACAGGGGTGTCTGT 1510 

GGACACTCCCACACCCAACTCTGCTACCAAGCAGGCGTCTCAGCTTTCCTCCTCCTTTAC 1109 

lllllllllll llllllll III llllllllllllllllllllllllllllllllllllll 
GGACACTCCCACACCCAACTCTGCTACCAAGCAGGCGTCTCAGCTTTCCTCCTCCTTTAC 1570 

CCTTTCAGATACAATCACGCCAGCCACGTTGTTTTGAAAATTTCTTTTTTTGGGGGGCAG 116 9 

llllllll! Mill lllll llllll! Mil II III IIMIIIIIIMIIIMII III III 

CCTTTCAGATACAATCACGCCAGCCACGTTGTTTTGAAAATTTCTTTTTTTGGGGGGCAG 163 0 
CAGTTTTCCTTTTTTTAAACTTAAATAAATT 1200 

IIIIIIIMIIIIIMIIIIIIIIIIIMII 

CAGTTTTCCTTTTTTTAAACTTAAATAAATT 1661 
(83.9 bits). Expect = 8.2e-17, P = 8.2e-17 

. 609/1017 (59%), Positives = 609/1017 (59%), Strand = Plus / Plus 

AGCGACACCTGTCCAACCCGGCCCGGCCTGGGATGCTATGTGGGGGCCCCCAGCCTGGGG 6 0 

MIIIMIilllMIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIllllllllilllll 

AGCGACACCTGTCCAACCCGGCCCGGCCTGGGATGCTATGTGGGGGCCCCCAGCCTGGGG 152 



IIIIIIIIMIIIIIIII III I I I I II I lllll 

TGCAGGGCCCCTGTCAGGTCTGATAGGGAGAAGAGAAGGAGCAGAAGGG - GAGGG - GCCT 

GACACTGGGTTCAGGCTGGCA- TCATCAG - - CTTTGCATCA - AGCTGTGCCCAGGAGGAC 

II lllll I II I I I III III II I I Mill III III 

AACCCTGGGCTGGGGGTTGGACTCA - CAGGACTGGGGGAAAGAGCTGCAATCAG- AGGGT 

GCTCCTGTGCT-GCTGACCA-ACACAGCTGCTCACAGTTCCTGGCTGCA-GGCTC G- 

I II II I MM I I II IN II I I I I III lllll I 

G - TC - TGCCATAGCTGGGCTCAGGCATCTG - TCCTTGG - CTTTGTTGCCTGGCTCCAGGG 

AG - TTCAGGGGGCAGCTTTCCTG - GCCCAGAGCCC - AGAGACCCCGGA(3ATGAGTGATGA 

II III llllll II I III III lllll I lllllllllll II 
AGATTCCGGGGGCC - CTGTGCTGTGCCTCGAGCCTGACGGACACTGG - GTTCAG - GCTG - 

GGACAGCTGTGTAGCCTGTGGATCCT - - TGAGGACAGCAGGTC - C - CCAG - GCAGGAGCA 

I II I I III I II III II I III III I I II I II I II 

-G- CATCA - TC - AGCTT - TGCATCAAGCTGTGCCCAGGAGGACGCTCCTGTGCTGCTG - A 

CCCTCCCCATGGCCCTGGGAGG - CCAGGCTG- ATGC:ACCAGGGACAGCTGGCCTGTGGCG 

llllll II II II Mill llllll Ml III I I 

CCAACAC-A-GCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTT - CAGGGGGCAGCTTTCC 
GAGCCCTGGTGTCAGAGGAGGCGGTGCTAACTGCTGCCCACTGCTTCATTGGGCGCCAGG 

MM I lllll Mil II M M Mill II I I 



:CCCAGAGGAATGGAG - - CGTAGGGCTGGGG - ACCAGACCGGAGGAGTGGGGCCTGAAGC 513 

II lllll I II I III M I Ml M II III II 



^GCTCATCCTGCATGGAGC - CTACACCCACCCTGAGGGGGGCTA - C - GACATGGCCCTCC 570 

Ml M I II Ml Ml I III I I I M III I I 

• GCTGATGCACCAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGGTGC 668 

65 



10 



15 



20 



25 



30 



Query : 


571 


Sbjct : 


669 


Query : 


630 


Sbj Ct : 


723 


Query: 


686 


Sbj ct : 


782 


Query: 


742 


Sbj ct : 


837 


Query: 


796 


Sbj ct : 


894 


Query: 


854 


Sbjct: 


951 


Query: 


913 


Sbj ct : 


1008 


Query: 


969 


Sbjct: 


1065 



TG - CTGCTGGCCCAGCCTGTGACACTGGGAGCCAGCCTGCGGCCCCTCTGCCTGCCCTAT 629 

I mill Mil III II INI Mill III II II II 

TAACTGCTG-CCCA- -CTGCTTCATTGGGCGCCAGGCCCCT^GAGGAA-TGGA-GCG-TAG 722 
G - CTGACCACCAC - CTGCCTGA - TGGGGAGCGTGGCTGGGT - TCTGGGACGGGCCCGCCC 685 

I III MM I I II lllll I II I II II II II I 

GGCTGGGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTAC - 781 
AGGAGCAGGCATCAGCTCC - CTCCAGACAGTGCCCGTGACCCTCCTGGG GCCTAGGG 741 

I II I I II III MM MM I II MM MM I 

ACC- -CACCC-TGAGGGGGGCTAC-GACATGGCCC-TCCTGCTGCTGGCCCAGCCTGTGA 836 
C - CTGC - AGCCGGC - TGCATGCAGCTCCTGGGGGTGATG - GCAG - CC - CTATTCTGCCGG 795 

I III MM II III M Ml M I I II I M II lllll I 

CACTGGGAGCCAGCCTGCG - GCCCCTC - TGCCTGCCCTATGCTGACCACCAC - CTGCCTG 893 

6GATGGTGTGTACCAGTGCTGTGGGT - GAGCT - GCCCAGCTGTGAGGCCAACCAACCAGC 853 

II I M I I I I MM I II IIIIM I MM 1 III I 

ATGGGGAGCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGGAGC-AGGC- -ATCAGCTCCC 950 

TGCTGACAGGGGACCTGGCCATTCTCAGGAACAAGAGAATGCAGGCAGGCAA - ATGGCAT 912 

I I lllll I III II III II MM Mill I III III II 

TCCAGACAGTGCCCGTGACCCTCCTGGGGC-CTAGGGCCTGCAGCC-GGCTGCATG-CAG 1007 



- TACTGCCCCTG - TC - CTCCCC - ACCCTGTCATGTGTGATTCCAGGCACCAGGGCAGGCC 

I III II I I Mi I MM II III I Mill II I 



CAGAAGCCCAGCAGCTGTGGGAAGGAACCTGCCTGGGGC - - CACAGGTGC 1016 

II II IMIIIII I II MM IIIIIM III Mill 

GTGA - GCTGCCCAGCTGTGAG - - GG - - CCTGTCTGGGGCACCACTGGTGC 1109 



968 



35 



Table 15. BLASTP identity search for the protein of the invention. 

>patp:Y41704 Human PROS 51 protein sequence - Homo sapiens, 571 aa. (seq id 

NO: 73) 



40 



Length = 571 
Plus strand HSPs : 



45 



50 



55 



60 



65 



Score = 1514 (533.0 bits). Expect = 1.6e-154, P = 1.6e-154 
Identities = 278/279 (99%), Positives = 278/279 (99%), Frame = +3 

RHLSNPARPGMLCGGPQPGVQGPCQGDSGGPVLCLEPDGHWVQAGIISFASSCAQEDAPV 182 

IIIIIIIIMI IIIIM Mlllllllllll IIIIIIMIIIII IIIIIIIIIIMIIIII 

RHLSNPARPGMLCGGPQPGVQGPCQGDSGGPVLCLEPDGHWVQAGI I S PASS CAQEDAPV 274 

LLTNTAAHSSWLQARVQGAAFLAQSPETPEMSDEDSCVACGSLRTAGPQAGAPSPWPWEA 3 62 

MIIIIIIIIIIIIIIIIIIIMIIIIIMIIIIIIIIIIIIIIIillllllllllllll 
LLTNTAAHSSWLQARVQGAAFLAQSPETPEMSDEDSCVACGSLRTAGPQAGAPSPWPWEA 334 

RLMHQGQLACGGALVSEEAVLTAAHCFIGRQAPEEWSVGLGTRPEEWGLKQLILHGAYTH 542 

lllllllllll IIIIIIIIIIIIIIIMIIIMII IIMIIIIIIIIIMIIIIIIMM 

RLMHQGQbACGGALVSEEAVLTAMCFIGRQAPEEWSVGLGTRPEEWGLKQLILHGAYTH 3 94 

PEGGYDMALLLLAQPVTLGASLRPLCLPYADHHLPDGERGWVIiGRARPGAGISSLQTVPV 722 

IIIMIIIIIIIIIIIMIIIMIIIIII IIIMMIIIMIIMIIIIIIMIIIMI 

PEGGYDMALIiLLAQPVTLGASLRPLCLPYPDHHLPDGERGWVliGRARPGAGISSIiQTVPV 454 



Query : 


3 


Sbj ct : 


215 


Query: 


183 


Sbj ct : 


275 


Query: 


363 


Sbj ct : 


335 


Query: 


543 


Sbj ct : 


395 


Query: 


723 


Sbjct: 


455 


Score 


= 225 



1 1 1 1 



lllll 



llllllllllllllllllll 



Identities = 71/203 (34%) , 



tpect = 4.6e-15, P = 4.6e-15 
Positives = 95/203 (46%), Frame 



+3 



66 
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PSPWPWEARLMHQGQLACGGALVSEEAVLTAAHCFIGRQAPE- -EWSVGLGT RP 494 

I lll^l ^ II I llllllll I I III W* I 

PGEWPWQASVRRQGAHICSGSLVADTWVLTAAHCFEKAAATELNSWSWIjGSLQREGLSP 122 

- -EEWGLKQLILHGAYTHPEGGYDMALLLIiAQPVTLGASLRPLCLPYADHHLPDGERGWV 668 
II 1^ I I II I I l^lll II I I lllll III I 

Sbjct: 123 G^ 

10 

+1* * + 1+ 1+ +11 + I lll+l II 
3TLRNLRLRL1SRPTCNCI YNQLHQRHLSN- - PARPGMLCG GPQP< 

PGHSQEQENAGRQMALLPLSS 929 
15 " * II I + 



20 



25 



30 



35 



45 



60 



Query: 


339 


Sbjct : 


63 


Query: 


495 


Sb j Ct : 


123 


Query: 


669 


Sbjct: 


179 


Query: 


834 


Sbjct: 


234 


Score 


a 125 


Identities ■• 


Query: 


15 


Sbjct: 


474 


Query: 


195 


Sbjct: 


532 



(44.0 bits). Expect = 0.00067, P = 0.00067 
32/95 (33%), Positives = 47/95 (49%), Frame = +3 



+ 1 IIIH I hi II I I I I I Ik II H I - I 

SPILPGMVCTSAV-GELPSCEGLSGAP-LVHEVRGTWFLAGLHSFGDACQGPARPAVFTJ 

TAAHSSWLQARVQGAAFLAQSPETPEMSDEDSCVA 299 
k k > ^ ^ h I I II ^+ I M 



>patp:Y90291 Human peptidase, HPEP-8 protein sequence - Homo sapiens, 2 67 aa, 

(SEQ ID NO: 74) 

Length ^ 267 

Plus strand HSPs : 



Score = 1028 (361.9 bits). Expect = 5.0e-103, P = 5.0e-103 
40 Identities 189/189 (100%), Positives = 189/189 (100%), Frame = +3 



50 Query: 63 3 DHHLPDGERGWVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILPGMVCTS 812 

llllll IIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

DHHLPDGERGWVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILPGMVCTS 180 



Query: 


273 


Sb j ct : 


1 


Query: 


453 


Sb j ct : 


61 


Query: 


633 


Sbjct: 


121 


Query: 


813 


Sbjct: 


181 


Score 


= 125 


Identities : 


Query: 


15 


Sb j ct : 


170 


Query: 


195 


Sb j ct : 


228 



MSDEDSCVACGSLRTAGPQAGAPSPWPWEARLMHQGQLACGGALVSEEAVLTAAHCFIGR 4 52 

MIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 
MSDEDSCVACGSLRTAGPQAGAPSPWPWEARLMHQGQLACGGALVSEEAVLTAAHCFIGR 6 0 

QAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALLLIiAQPVTLGASLRPLCLPYA 632 

IIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 
QAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRPLCLPYA 120 



55 I I I II I I I I 



(44.0 bits). Expect = 0.00016, P = 0.00016 
= 32/95 (33%), Positives = 47/95 (49%), Frame = +3 

NPARPGMLCGGPQPGVQGPCQGDSGGPVLCLEPDGHWVQAGIISFASSCAQEDAPVLLTN 194 

H llkl I hllll I I I I 11+ II ^1 I + I 

SPILPGMVCTSAV-GELPSCEGLSGAP-LVHEVRGTWFLAGLHSFGDACQGPARPAVFTA 227 



65 Query: 195 TAAHSSWLQARVQGAAFliAQSPETPEMSDEDSCVA 299 

1+ k + + + 1+ II II ++ IIH 
LPAYEDWVS S - LDWQVYFAEEPE - PE - AEPGSCLA 259 
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Table 16. BLASTN identity search (versus the human SeqCalling database for the 
Peptidase-like protein of the invention. 

>s3aq:132854740 Category D: 12 frag (12 non-5 • sig-CG) , 636 bp. (SEQ id NO:75) 
Length = 636 



Minus Strand HSPs : 

Score = 1423 (213.5 bits). 
Identities = 313/343 (91%) 



Expect = 7.0e-59, P « 7.0e-59 
, Positives 313/343 (91%), Strand 



Minus / Plus 





Query i 


754 




QH^i r>t- ■ 
OUJ Cu ■ 


295 


15 


Query: 


695 




Sbjct: 


353 




Query: 


635 




Sb j Ct : 


413 




Query: 


575 


25 


Sbjct: 


473 




Query: 


■ 515 


30 


Sbjct: 


533 




Query : 


455 




Sbjct: 


593 


35 


Score 


= 757 




Identities < 




Query : 


869 


40 


Sb j ct : 


105 




Query: 


809 


45 


Sb j ct : 


163 




Query: 


74 9 




Sb j ct : 


223 


50 








>33aq: 


134913 

] 


55 


Plus 


St ram 




Score 


= 297 




Identities 


60 


Query: 


1138 




Sbjct: 


10 




Query : 


1198 



AGCCGGCTGCAG - GCCCTAGGCCCCAGGAGGGTCACGGGCACTGTCTGGAGGGAGCTGAT 696 

III llllll I III I III II II II III I I I I I 

AGCTGGCTGCCCCGGCCT-GCAGGTTGGATGGAC7VGCAGCCCTGGCCCT-GTGCCCACCT 352 

GCCTGCTCCTGGGCGGGCCCGTCCCAGAACCCAGCCACGCTCCCCATCAGGCAGGTGGTG 63 6 

lllllllllllllllllllMIIIMIilllllllllllllllllllllllllllllll 

ACCTGCTCCTGGGCGGGCCCGTCCCAGAACCCAGCCy^CGCTCCCCATCAGGCAGGTGGTG 412 
GTCAGCATAGGGCAGGCAGAGGGGCCGCAGGCTGGCTCCCAGTGTCACAGGCTGGGCCAG 576 

IMII IMIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIMII 

GTCAGGATAGGGCAGGtZAGAGGGGCCGCAGGCTGGCTCCCAGTGTCACAGGCTGGGCCAG 472 

CAGCAGGAGGGCCATGTCGTAGCCCCCCTCAGGGTGGGTGTAGGCTCCATGCAGGATGAG 516 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
CAGCAGGAGGGCCATGTCGTAGCCCCCCTCAGGGTGGGTGTAGGCTCCATGCAGGATGAG 532 

CTGCTTCAGGCCCCACTCCTCCGGTCTGGTCCCCAGCCCTACGCTCCATTCCTCTGGGGC 456 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
CTGCTTCIAGGCCCCACTCCTCCGGTCTGGTCCCCAGCCCTACGCTCCATTCCTCTGGGGC 592 



CTGGCGCCCAATGAAGCAGTGGGCAGCAGTTAGCACCGCCTCCT 

IIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CTGGCGCCCAATGAAGCAGTGGGCAGCAGTTAGCACCGCCTCCT 



412 



636 



«pect = 8.5e-29, P = 8.5e-29 
Positives = 165/179 (92%), Strand 



Minus / Plus 



AGGTCCCCTGTCAGCAGCTGGTTGGTTGGCCTCACAGCTGGGCAGCTCACCCACAGCACT 

nil III I nil II I iiiiiiiiiiiiiiiiiiiiiiiiiiiiiii 



810 



162 



GGTACACACCATCCCCGGCAG7UVTAGGGCTGCCATCACCCCCAGGAGCTGCATGCAGCCG 750 

iiiiniiiiiiniiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii llllll III 

GGTACACACCATCCCCGGCAGAATA(3GGCTGCCATCACCCCCAGGAGCTGCATGCyVGCCG 222 

GCTGCAGGCCCTAGGCCCCAGGAGGGTCACGGGCACTGTCTGGAGGGAGCTGATGCCTG 691 

I II nil I III llllll II inn in I nil III I III I nil II inn III I II II 

GCTGCAGGCCCTAGGCCCCIAGGAGGGTCACGGGCACTGTCTGGAGGGAGCTGATGCCTG 281 



Length =415 



II I iiiiiiiiiiiiiiiiiiiiiiiiiiiiiniiiiiiiiiniiiiiiiniii 
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III 
70 ATT 72 
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Table 17. ClustalW alignment of the protein of the invention. 

CG30817-06 

Y41704 MLLSS LV S LA CSV Y L AWI LF F VL YDFC I VC I T TYAI N VS LMWLS F RK V QEPQOKAKR HON 

Y90291 

CG50817-06 

Y41704 T VPO EWPWQASVRRQOAH I C SOS LVADT WV LT AAHC F EKAA ATE LNSWSVVLGSLQR EOL 
Y90291 

CG508 17-06 

Y41704 SPOAE EV0VAALQLPRAYNHY3Q0SDLA L LQ L AHPTTHTP L C LP QP AHRFPFOASCWATO 
Y90291 

CO50817-06 

Y41704 WDQDT SDAPOTLRNLRLRLI SRPTCNCI Y NQ LHQRHLSNP A R POj 
Y90291 

CO508 17-06 
Y41704 
Y90291 

CG50817.06 
Y41704 
Y90291 

CG50817-06 
Y417(W 
Y90291 

CO50817-06 
Y41704 
Y90291 

CG50817-06 ItfAlBiLWf >idl» Jsfrf Ja ! NOP R Q P Q ilglPEQEN AHR QMAL[L}PLSSiaP CH V 

Y41704 
y90291 

0050817^6 
Y41704 
Y90291 



^4LCGOPQPGVQ0P CQG 
^ylLCGOPQPGVQGP CQG 



DSGG P V LC L EPDGH WVQ AG I I SFASSCA Q E D A P V L LTNTA A H S S WLQ A RVQGAAF LA Q S P 
DSOG P V LC L EPDGHWVQAGI I SF ASSC A Q ED A P VLLTNTAA H S S WLQ ARVQGAAF LA QS P 



ETPEMSDEDSCVACGS LRTAGPQAGAPS P WP WEARLMHQGQ L AC G OA L VSEEAV LT A AHC- 
ETPE.M S D ED S CVACGS LRTAGPQAGAPS P WP WE ARLMHQGQ L AC G GA L VSEEAV LT A A HC 
VISDED SCVACGS LRTAG POAGAPS P WP WE ARLMHQGQ LACOGA LVSEEA VLT A AHC 



F IGRQ AP EEWSVGLGTRPEEWGLKQLI L H G A Y TH P EGGYDM A LL L LAQ PVTLGAS LR PLC 
F IGRQAP EEWSVGLGTRPEEWGLKQLI L H G AY TH P EGGYDMA LL L LAQ PVTLGAS LR PLC 
F I G R Q AP EEWSVGLGTRPEEWGL?:OLI L H G AY THP EGGYDMA LL L LAQ PVTLGAS LR P L C 



LPYAD HHLP DGERGWV LGRARPGAGi SS L Q T V P VTLLGP RA C S R LHA A PGGDGS P I L PGM 
LPYQd HHLP DGERGWV LGRARPGAGI SS L QT V P VTLLGP RA C S R LHA A PGGDGS P I L PGM 
L PYA D HH L P DGERGWV L GRAR PGAG I S S L QTV P VTLLGP RA C S R LHAA PGGDGS P I L PGM 



VCT S A VGEL PSCEI 

VCT S AVGELPSCEGLSGAPLVHEVRGTWF L AG LHS F GDACQ G P AR P A V FTALPAYEDWVS 
VCT S A VGEL PSCEGLSGAPL VHEVRGTWF L AG LHS F GDACQ G P AR PA V FTALP AYED WVS 



S LDWQ V Y F AEEPEP EA EPGS C LANI SQP T SC 
S LDWQ VYF AEEPEP EAEPGS C LANI SQP T SC 



Information for the ClustalW proteins: 

Accno Common Name Length 

CG508 1 7-06 novel Peptidase-like protein 

Y41704 Human PR0351 protein sequence. 571 

Y9029 1 Human peptidase, HPEP-8 protein sequence. 267 



In the alignment shown above, black outlined amino acid residues indicate regions of 
conserved sequence (i.e., regions that may be required to preserve structural or functional 
properties); greyed amino acid residues can be mutated to a residue with comparable steric 
and/or chemical properties without altering protein structure or function (e.g. L to V, I, or M); 
10 non-highlighted amino acid residues can potentially be mutated to a much broader extent without 
altering structure or function. 
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Table 18. Psort, Signal P and hydropathy results for CG50817-06 

cytoplasm Certainty=0 , 4500 (Affirmative ) < suco 

microbody (peroxisome) CertaintyssO . 3 000 (Affirmative) < suco 

lysosome (lumen) Certainty=0 . 2334 (Affirmative) < suco 

5 mitochondrial matrix space Certainty^^O . 1000 (Affirmative) < suco 

Is the sequence a signal peptide? 

Measure Position Value Cutoff Conclusion 

10 max. C 45 0.253 0,37 NO 

max. Y 17 0.064 0.34 NO 

max. S 68 0.536 0.88 NO 

mean S 1-16 0.130 0.48 NO 

SECP14 

1 5 A SECP14 nucleic acid and polypeptide according to the invention includes the nucleic 

acid sequence (SEQ ID NO:48) and encoded polypeptide sequence (SEQ ID NO:49) of clone 
CG508 17-06 directed toward novel serine protease-like proteins and nucleic acids encoding 
them. Figure 19 illustrates the nucleic acid sequence and amino acid sequences respectively. 
This clone includes a nucleotide sequence (SEQ ID NO:48) of 1214 bp. The nucleotide 

20 sequence includes an open reading frame (ORE) beginning with an ATG initiation codon at 
nucleotides 31-33 and ending at nucleotides 1 186-1 188. Putative untranslated regions, if any, 
are found upstream from the initiation codon and downstream from the termination codon. The 
encoded protein having 385 amino acid residues is presented using the one-letter code in Figure 
19. The protein encoded by clone CG5 1099-03 is predicted by the PSORT program to the 

25 outside of the membrane with a certainty of 0.5804, and appears to be a signal protein (see Table 
22 below). 

The serine protease tryptase (ECNr. 3.4. 21.59), which is almost exclusively 
expressed in mast cells, is released by mast cell degranulation in an enzymatically active form 
together with other mediators, e.g. histamine, into the extracellular space and the circulation. The 

30 capability of the enzyme to directly stimulate several cell types as well as to cleave polypeptide 
hormones and to activate pro-enzymes suggests a role for tryptase in inflammatory and tissue- 
remodeling processes. Therefore, in the skin, a role of tryptase is suggested not only in 
mastocytosis and immediate type hypersensitivity reactions, but also in other inflammatory 
diseases, degenerative or neoplastic conditions as well as in wound healing, where an 

35 accumulation and/or activation of mast cells is found- Extracellular tryptase may be superior to 
histamine as a parameter for the onset and course of immediate type reactions and as an' indicator 
for the activation of mast cells in other conditions. Its absence during histamine-liberating 
reactions may suggest basophil activation. In addition, tryptase has been shown to be a sensitive 
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and specific marker for the localization of mast cells in tissues (Ludolf-Hauser et al., 1999, 
Hautarzt 50:556-61). 

Tryptases are stored in abundance in the secretory granules of mouse (McNeil et al, 1992, 
Proc. Natl. Acad. Sci. U. S. A. 89, 1 11 74-1 1 178; Johnson, D. A., and Barton, G., 1992, Protein 
5 Sci. 1, 370-377), and human (Vanderslice et al., 1990, Proc. Natl. Acad. Sci. U. S. A. 87, 381 1- 
3815) mast cells (MCs). In humans, the four homologous tryptases (designated tryptases I, II/ , 
III, and ) that have been cloned reside at a complex on chromosome 16 (Pallaoro et al., 1999, J. 
Biol. Chem. 274, 3355-3362). Although only two tryptases (designated mouse MC protease 
(mMCP) 6 and mMCP-7) have been identified so far in the mouse, their genes reside ~1 .2 

10 centimorgans away from each other on the syntenic region of mouse chromosome 17 (Gurish et 
al., 1994, Mammal. Genome 5, 656-657). Despite the chromosomal clustering of their genes, 
these mouse tryptases are differentially regulated in vivo (Reynolds et al., 1990, Proc. Natl. 
Acad. Sci. U. S. A. 87, 3230-3234) and in vitro (Reynolds et al., 1991, J. Biol. Chem. 266, 3847- 
3853; McNeil et al, 1992, Proc. Nad. Acad. Sci. U. S. A. 89, 1 1 174-1 1 178) at the levels of gene 

15 transcription (Morri et al., 1996, Blood 88, 2488-2494) and mRNA stability. 

All known mouse and human tryptases in this family are initially translated as zymogens. 
They possess an ~20-residue hydrophobic signal peptide which is presumed to be removed in the 
endoplasmic reticulum immediately after the translated zymogen is translocated into the lumen. 
They also possess an ~10-residue propeptide preceding the mature portion of the enzyme which 
20 consists of —245 amino acids. Although tryptases undergo variable N-linked glycosylation during 
their bios)mthesis (Ghidyal et al., 1994, J. Immunol. 153, 2624-2630), the current members of 
the family appear to be targeted to the secretory granule by a serglycin proteoglycan-dependent 
mechanism (Ghidyal et al., 1996, J. Exp. Med. 184, 1061-1073) rather than by a Man-P04- 
dependent mechanism as are classical lysosomal enzymes. 

25 Recently, Wong et al. (1 999, J Biol Chem 274, 30784-30793) described a novel mouse 

gene, and its human ortholog, which encode an unusual transmembrane trj^tase (TMT). 
Comparative structural studies indicated that the putative transmembrane tryptase (TMT) 
possesses a unique substrate-binding cleft. As assessed by RNA blot analyses, mTMT is 
expressed in mice in both strain- and tissue-dependent manners. Thus, different transcriptional 

30 and/or post-transcriptional mechanisms are used to control the expression of mTMT in vivo. 

Analysis of the corresponding tryptase locus in the human genome resulted in the isolation and 
characterization of the hTMT gene. The hTMT transcript is expressed in numerous tissues and is 
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also translated. Analysis of the tryptase family of genes in mice and humans now indicates that a 
primordial serine protease gene duplicated early and often during the evolution of mammals to 
generate a panel of homologous tryptases in each species that differ in their tissue expression, 
substrate specificities, and physical properties. 

5 

Similarities 

In a search of sequence databases, it was found, for example, that the nucleic acid 
sequence of this invention has 1213 of 1213 bases (100%) identical to a gb:GENBANK- 
ID:AX079882|acc:AX079882.1 mRNA from Homo sapiens (Sequence 13 from Patent 
10 WOO 105971) (See Table 19). The full amino acid sequence of the protein of the invention was 
found to have 385 of 385 amino acid residues (100%) identical to, and 385 of 385 amino acid 
residues (100%) similar to, the 385 amino acid residue ptnr:SPTREMBL-ACC:Q9UI38 protein 
from Homo sapiens (Human) (TESTES-SPECIFIC PROTEIN TSP50)(See Table 20). 

A multiple sequence alignment is given in Table 21 , with the protein of the invention 
15 being shown on the first line in a ClustalW analysis comparing the protein of the invention with 
related protein sequences. 

The presence of identifiable domains in the protein disclosed herein was determined by 
searches versus domain databases such as Pfam, PROSITE, ProDom, Blocks or Prints and then 
identified by the Interpro domain accession number. Significant domains are summarized below: 

20 

Model Domain seq-f seq-t hmm-f hmm-t score E -value 

trypsin 1/2 118 297 6 199 104.4 2.6e-32 

25 trypsin 2/2 313 353 215 259 35.9 1.6e-10 

The catalytic activity of the serine proteases from the trypsin family is provided by a 
charge relay system involving an aspartic acid residue hydrogen-bonded to a histidine, which 
itself is hydrogen-bonded to a serine. The sequences in the vicinity of the active site serine and 
histidine residues are well conserved in this family of proteases (Sprang et al., 1987, Science 
30 237:905-909). A partial list of proteases known to belong to the trypsin family is shown below. 



- Acrosin. 

- Blood coagulation factors VII, IX, X, XI and XII, thrombin, plasminogen, 
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and protein C. 

- Cathepsin G. 

- Chymotrypsins. 

- Complement components Clr, CIs, C2, and complement factors B, D and I. 
5 - Complement-activating component of RA -reactive factor. 

- Cytotoxic cell proteases (granzymes A to H). 

- Duodcnasc I. 

- Elastases 1,2, 3A, 3B (protease E), leukocyte (medullasin). 

- Entcrokinase (EC 3.4.21.9) (cnleropeptidasc). 
10 - Hcpatocytc growth factor activator. 

- Hcpsin. 

- Glandular (tissue) kallikreins (including EGF-binding protein types A, B, 
and C, NGF-ganima chain, gamma-rcnin, prostate specific antigen (PSA) and 
tonin). 

15 - Plasma kallikrein. 

- Mast cell proteases (MCP) 1 (chymase) to 8. 

- Mycloblaslin (proteinase 3) (Wegener's autoantigcn). 

- Plasminogen activators (urokinasc-lype, and tissue-type). 

- Trypsins I, II, III, and IV. 
20 - Trypiases. 

- Snake venom proteases such as ancrod, batroxobin, cerastobin, flavoxobin, 
and protein C activator. 

-Collagcnase from common cattle grub and collagcnolytic protease from 
Atlantic sand fiddler crab. 
25 - Apolipoprotein(a). 

- Blood fluke cercarial protease. 

- Drosophila trypsin like proteases: alpha, caster, snake-locus. 

- Drosophila protease stubble (gene sb). 

- Major mite fecal allergen Der p III. 

30 All the above proteins belong to family SI in the classification of peptidases. 



This indicates that the sequence of the invention has properties similar to those of other 
proteins known to contain this/these domain(s) and similar to the properties of these domains. 



Chromosomal information: 



The Serine Protease-like gene disclosed in this invention maps to chromosome 3. This 
35 assignment was made using mapping information associated with genomic clones, public genes 
and ESTs sharing sequence identity with the disclosed sequence and CuraGen Corporation's 
Electronic Northern bioinformatic tool. 



Tissue expression 



The Serine Protease-like gene disclosed in this invention is expressed in at least the 
40 following tissues: adipose, adrenal gland, thyroid, brain, heart, skeletal muscle, bone marrow, 

colon, bladder, liver, lung, mammary gland, placenta, testis. Expression information was derived 
from the tissue sources of the sequences that were included in the derivation of the sequence of 
CuraGen Acc. No. CG51099-03.The sequence is predicted to be expressed in the following 
tissues because of the expression pattern of (GENBANK-ID: gbiGENBANK- 
45 ID:AX079882|acc:AX079882.1) a closely related Sequence 13 from Patent WO0105971 
homolog in species Homo sapiens: testis. 
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Cellular Localization and Sorting 

The PSORT, SignalP and hydropathy profile for the Serine Protease-Hke protein are 
shown in Table 22. The results predict that this sequence has a signal peptide and is likely to be 
localized cxtracellularly with a certainty of 0.5804. The signal peptide is predicted by SignalP to 
5 be cleaved at amino acid 39 and 40: CWG-AG. 

Functional Variants and Homologs 

The novel nucleic acid of the invention encoding a Serine Protease-like protein includes 
the nucleic acid whose sequence is provided in Figure 19, or a fragment thereof. The invention 
also includes a mutant or variant nucleic acid any of whose bases may be changed from the 

10 corresponding base shown in Figure 19 while still encoding a protein that maintains its Serine 
Protease-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to the sequence of 
CuraGen Acc. No. CG5 1099-03, including nucleic acid fragments that are complementary to any 
of the nucleic acids just described. The invention additionally includes nucleic acids or nucleic 

15 acid fragments, or complements thereto, whose structures include chemical modifications. Such 
modifications include, by way of non-limiting example, modified bases, and nucleic acids whose 
sugar phosphate backbones are modified or derivatized. These modifications are carried out at 
least in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. In 

20 the mutant or variant nucleic acids, and their complements, up to about 0% of the bases may be 
so changed. 

The novel protein of the invention includes the Serine Protease-like protein whose 
sequence is provided in Figure 19. The invention also includes a mutant or variant protein any of 
whose residues may be changed from the corresponding residue shown in Figure 19 while still 
25 encoding a protein that maintains its Serine Protease-like activities and physiological functions, 
or a functional fragment thereof. In the mutant or variant protein, up to about 0% of the amino 
acid residues may be so changed. 

Antibodies 

The invention further encompasses antibodies and antibody fragments, such as Fab, 
30 (Fab)2 or single chain FV constructs, that bind immunospecifically to any of the proteins of the 
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invention. Also encompassed within the invention are peptides and polypeptides comprising 
sequences having high binding affinity for any of the proteins of the invention, including such 
peptides and polypeptides that are fused to any carrier particle (or biologically expressed on the 
surface of a carrier) such as a bacteriophage particle. 

5 Uses of the Compositions of the Invention 

The protein similarity information, expression pattern, cellular localization, and map 
location for the protein and nucleic acid disclosed herein suggest that this Serine Protease-like 
protein may have important structural and/or physiological functions characteristic of the 
Trypsin family. Therefore, the nucleic acids and proteins of the invention are useful in potential 

10 diagnostic and therapeutic applications and as a research tool. These include serving as a specific 
or selective nucleic acid or protein diagnostic and/or prognostic marker, wherein the presence or 
amount of the nucleic acid or the protein are to be assessed. These also include potential 
therapeutic applications such as the following: (i) a protein therapeutic, (ii) a small molecule 
drug target, (iii) an antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), 

15 (iv) a nucleic acid useful in gene therapy (gene delivery/gene ablation), (v) an agent promoting 
tissue regeneration in vitro and in vivOy and (vi) a biological defense weapon. 

The nucleic acids and proteins of the invention have applications in the diagnosis and/or 
treatment of various diseases and disorders. For example, the compositions of the present 
invention will have efficacy for the treatment of patients suffering from: adrenoleukodystrophy , 

20 congenital adrenal hyperplasia, hyperthyroidism, hypothyroidism, Von Hippel-Lindau (VHL) 
syndrome, Alzheimer's disease, stroke, tuberous sclerosis, hypercalceimia, Parkinson's disease, 
Huntington's disease, cerebral palsy, epilepsy, Lesch-Nyhan syndrome, multiple sclerosis, 
ataxia-telangiectasia, leukodystrophies, behavioral disorders, addiction, anxiety, pain, 
neurodegeneration, cardiomyopathy, atherosclerosis, hypertension, congenital heart defects, 

25 aortic stenosis, atrial septal defect (ASD), atrioventricular (A-V) canal defect, ductus arteriosus, 
pulmonary stenosis, subaortic stenosis, ventricular septal defect (VSD), valve diseases, 
scleroderma, obesity, transplantation, muscular dystrophy, myasthenia gravis, hemophilia, 
hypercoagulation, idiopathic thrombocytopenic purpura, autoimmune disease, allergies, 
immunodeficiencies, graft versus host disease, cirrhosis, systemic lupus erythematosus, asthma, 

30 emphysema, ARDS, fertility, cancer, as well as other diseases, disorders and conditions. 
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These materials are further useful in the generation of antibodies that bind 
immunospecifically to the novel substances of the invention for use in diagnostic and/or 
therapeutic methods. 



Table 19. BLASTN search using CuraGen Ace. No. CG51099-03. 

>gb:GENBANK-ID:AX079882 |acc:AX079882.1 Sequence 13 from Patent WO0105971 
sapiens, 1359 bp. (seq id nO:77) 
Length = 1359 



Homo 



10 



Plus strand HSPs: 

Score = 6065 (910.0 bits). Expect = 4.8e-268, P = 4.8e-268 
Identities « 1213/1213 (100%), Positives « 1213/1213 (100%), Strand 



Plus / Plus 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



Query: 


1 


Sbjct : 


15 


Query: 


61 


Sbjct: 


75 


Query: 


121 


Sbjct: 


135 


Query: 


181 


Sb j Ct : 


195 


Query: 


241 


Sbj ct : 


255 


Query: 


301 


Sbjct : 


315 


Query : 


361 


Sbjct: 


375 


Query: 


421 


Sbjct: 


435 


Query : 


481 


Sbjct: 


495 


Query: 


541 


Sbjct : 


555 


Query: 


601 


Sbj ct : 


615 


Query: 


661 


Sbj ct : 


675 


Query: 


721 


Sbjct: 


735 



CGGAGAGACGCAGTCGGCTGCCACCCCGGGATGGGTCGCTGGTGCCAGACCGTCGCGCGC 6 0 

IIIIIIIIIIIIIIIIMIIIIMIIIIIIIillllllllllllllllllllMIIIIII 

CGGAGAGACGCAGTCGGCTGCCACCCCGGGATGGGTCGCTGGTGCCAGACCGTCGCGCGC 7 4 

GGGCAGCGCCCCCGGACGTCTGCCCCCTCCCGCGCCGGTGCCCTGCTGCTGCTGCTTCTG 120 

IIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIMIMIIIIIMII 

GGGCAGCGCCCCCGGACGTCTGCCCCCTCCCGCGCCGGTGCCCTGCTGCTGCTGCTTCTG 134 

TTGCTGAGGTCTGCAGGTTGCTGGGGCGCAGGGGAAGCCCCGGGGGCGCTGTCCACTGCT 180 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIII 

TTGCTGAGGTCTGCAGGTTGCTGGGGCGCAGGGGAAGCCCCGGGGGCGCTGTCCACTGCT 1 94 
GATCCCGCCGACCAGAGCGTCCAGTGTGTCCCCAAGGCCACCTGTCCTTCCAGCCGGCCT 240 

IMIIIIIIIIIMIIIIIIIIIIIIIIMIIIIIIIIIIIIIMIIIIIIIMIIIIII 

GATCCCGCCGACCAGAGCGTCCAGTGTGTCCCCAAGGCCACCTGTCCTTCCAGCCGGCCT 2 54 
CGCCTTCTCTGGCAGACCCCGACCACCCAGACACTGCCCTCGACCACCATGGAGACCCAA 300 

IIIIIIIMMIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIII 

CGCCTTCTCTGGCAGACCCCGACCACCCAGACACTGCCCTCGACCACCATGGAGACCCAA 3 14 
TTCCCAGTTTCTGAAGGCAAAGTCGACCCATACCGCTCCTGTGGCTTTTCCTACGAGCAG 360 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

TTCCCAGTTTCTGAAGGCAAAGTCGACCCATACCGCTCCTGTGGCTTTTCCTACGAGCAG 374 
GACCCCACCCTCAGGGACCCAGAAGCCGTGGCTCGGCGGTGGCCCTGGATGGTCAGCGTG 4 20 

MIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIII 

GACCCCACCCTCAGGGACCCAGAAGCCGTGGCTCGGCGGTGGCCCTGGATGGTCAGCGTG 434 
CGGGCCAATGGCACACACATCTGTGCCGGCACCATCATTGCCTCCCAGTGGGTGCTGACT 4 80 

IIIIIIIIIIMIIIIIIIMIIIIIIIIIIIIIIIMMIIIIIIIIIIIIIIIIIIII 

CGGGCCAATGGCACACACATCTGTGCCGGCACCATCATTGCCTCCCAGTGGGTGCTGACT 4 94 
GTGGCCCACTGCCTGATCTGGCGTGATGTTATCTACTCAGTGAGGGTGGGGAGTCCGTGG 540 

IIIIIIIIMMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIM 

GTGGCCCACTGCCTGATCTGGCGTGATGTTATCTACTCAGTGAGGGTGGGGAGTCCGTGG 554 
ATTGACCAGATGACGCAGACCGCCTCCGATGTCCCGGTGCTCCAGGTCATCATGCATAGC 6 00 

IIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIMIIIIIIinillllllllMIIII 

ATTGACCAGATGACGCAGACCGCCTCCGATGTCCCGGTGCTCCAGGTCATCATGCATAGC 6 14 
AGGTACCGGGCCCAGCGGTTCTGGTCCTGGGTGGGCCAGGCCAACGACATCGGCCTCCTC 660 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIMIIIII 

AGGTACCGGGCCCAGCGGTTCTGGTCCTGGGTGGGCCAGGCCAACGACATCGGCCTCCTC 6 74 
AAGCTCAAGCAGGAACTCAAGTACAGCAATTACGTGCGGCCCATCTGCCTGCCTGGCACG 720 

IIIIIMIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

AAGCTCAAGCAGGAACTCAAGTACAGCAATTACGTGCGGCCCATCTGCCTGCCTGGCACG 734 
GACTATGTGTTGAAGGACCATTCCCGCTGCACTGTGACGGGCTGGGGACTTTCCAAGGCT 780 

IIIIMIIMIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIII 

GACTATGTGTTGAAGGACCATTCCCGCTGCACTGTGACGGGCTGGGGACTTTCCAAGGCT 7 94 
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GACGGCATGTGGCCTCAGTTCCGGACCATTCAGGAGAAGGAAGTCATCATCCTGAACAAC 840 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
GACGGCATGTGGCCTCAGTTCCGGACCATTCAGGAGAAGGAAGTCATCATCCTGAACAAC 8 54 

AAAGAGTGTGACAATTTCTACCACAACTTCACCAAAATCCCCACTCTGGTTCAGATCATC 900 

IIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMI 

AAAGAGTGTGACAATTTCTACCACAACTTCACCAAAATCCCCACTCTGGTTCAGATCATC 914 

10 Query: 901 AAGTCCCAGATGATGTGTGCGGAGGACACCCACAGGGAGAAGTTCTGCTATGAGCTAACT 960 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
AAGTCCCAGATGATGTGTGCGGAGGACACCCACAGGGAGAAGTTCTGCTATGAGCTAACT 9 74 

GGAGAGCCCTTGGTCTGCTCCATGGAGGGCACGTGGTACCTGGTGGGATTGGTGAGCTGG 1020 

IS ^ ' IIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIII 

GGAGAGCCCTTGGTCTGCTCCATGGAGGGCACGTGGTACCTGGTGGGATTGGTGAGCTGG 1034 



25 



30 



45 



65 



70 



Query: 


781 


Sbjct : 


795 


Query : 


841 


Sbjct: 


855 


Query : 


901 


Sbjct: 


915 


Query: 


961 


Sbjct: 


975 


Query: 


1021 


Sbjct: 


1035 


Query : 


1081 


Sbjct: 


1095 


Query : 


1141 


Sbjct: 


1155 


Query : 


1201 


Sb j Ct : 


1215 


Table 


20 . 



GGTGCAGGCTGCCAGAAGAGCGAGGCCCCACCCATCTACCTACAGGTCTCCTCCTACCAA 1080 

IIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

20 Sbjct: 103 5 GGTGCAGGCTGCCAGAAGAGCGAGGCCCCACCCATCTACCTACAGGTCTCCTCCTACCAA 1094 



CACTGGATCTGGGACTGCCTCAACGGGCAGGCCCTGGCCCTGCCAGCCCCATCCAGGACC 1140 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMMIMIIIII 

CACTGGATCTGGGACTGCCTCAACGGGCAGGCCCTGGCCCTGCCAGCCCCATCCAGGACC 1154 

CTGCTCCTGGCACTCCCACTGCCCCTCAGCCTCCTTGCTGCCCTCTGACTCTGTGTGCCC 1200 

IIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIillllllllllllllllM 

CTGCTCCTGGCACTCCCACTGCCCCTCAGCCTCCTTGCTGCCCTCTGACTCTGTGTGCCC 1214 



lllllllllllll 



BLASTP search using the protein of CuraGen Acc . No. CG51099-03. 
35 >ptnr :SPTREMBL-ACC:Q9UI38 TESTES-SPECIFIC PROTEIN TSP50 - Homo sapiens (Human), 
385 aa. (SEQ ID NO:78) 
Length = 385 

Score = 2090 (735.7 bits), Expect = 4.5e-216, P = 4.5e-216 
40 Identities =: 385/385 (100%), Positives = 385/385 (100%) 

MGRWCQTVARGQRPRTSAPSRAGALLLLLLLLRSAGCWGAGEAPGALSTADPADQSVQCV 6 0 

IIIIIIMIIIIIMIIIIIIMIIIIMIIIIIIIIIMIIIIIIIIIIIIIIIIIIII 

MGRWCQTVARGQRPRTSAPSRAGALIiLLLLLLRSAGCWGAGEAPGALSTADPADQSVQCV 6 0 



Query : 


1 


Sb j ct : 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sb j ct : 


181 


Query: 


241 


Sbjct : 


241 


Query: 


301 


Sbjct: 


301 


Query: 


361 


Sbjct: 


361 



PKATCPSSRPRLLWQTPTTQTLPSTTMETQFPVSEGKVDPYRSCGFSYEQDPTLRDPEAV 120 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
PKATCPSSRPRLLWQTPTTQTIiPSTTMETQFPVSEGKVDPYRSCGFS YEQDPTLRDPEAV 120 



50 Query: 121 ARRWPWMVSVRANGTHICAGTIIASQWVLTVAHCLIWRDVIYSVRVGSPWIDQMTQTASD 180 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
ARRWPWMVSVRANGTHICAGTIIASQWVLTVAHCLIWRDVIYSVRVGSPWIDQMTQTASD 180 

VPVLQVIMHSRYRAQRFWSWVGQANDIGLLKLKQELKYSNYVRPICLPGTDYVLKDHSRC 24 0 

55 ^ ' 1 1 1 1 1 1 1 1 1 1 1 I 1 1 I 1 1 1 1 I I M 1 1 1 1 I I I 1 1 I I I I I I I M I I I 1 1 I M I I 1 1 I I 1 1 1 1 I 

VPVLQVIMHSRYRAQRFWSWVGQANDIGLLKLKQELKYSNYVRPICLPGTDYVLKDHSRC 240 
TVTGWGLSKADGMWPQFRTIQEKEVIILNNKECDNFYHNFTKIPTLVQIIKSQMMCAEDT 3 00 

IIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIII 

60 Sbjct: 241 TVTGWGLSKADGMWPQFRTIQEKEVIILNNKECDNFYHNFTKIPTLVQIIKSQMMCAEDT 300 
HREKFCYELTGEPLVCSMEGTWYLVGLVSWGAGCQKSEAPPIYLQVSSYQHWIWDCLNGQ 3 60 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIMIIMIIIIIIIM 

HREKFCYELTGEPLVCSMEGTWYLVGLVSWGAGCQKSEAPPIYLQVSSYQHWIWDCLNGQ 360 



AliALPAPSRTLLLALPLPLSLLAAL 3 85 

lllllllllllllllllllllllll 

ALALPAPSRTLLLALPLPLSLIiAAL 385 
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Table 21. ClustalW alignment of CG51099-03 protein with related proteins. 



Q9UI38 
CO51099.03 



GRWCQ 
ORWCQ 



TEST_HUMAN MMA 

PSS8 HUMAN . - -MAHKO VLB? O Q LO 




PKATCPS SRPRL LWQTPTTQT L P S T TM ET Q F P VSEGK V D P Y R S CGF S YEQ D 
PKATCPSSRPRL LWQTPTTQT L P S T T M ET Q F P VS EGK V D P Y R S CG F S YEQ D 



Q9UI38 

C05 1099-03 

TEST_HUMAN - - EAAQlHoI 

PSS8 HUMAN A EAg 



P TWRDy 
P THRDli 
RRV I T S R I iV,OGEe^E|E 
VAP -HaR I TGOSSf 





Q9UI38 gR 

CO51099-03 Hr 

TEST_HUMAN LO 

PSS8_HUMAN 00 Qi 

Q9UI38 
CX551099-03 
TEST_HUMAN 
PSS8_HUMAN 

Q9UI38 
CO51099-03 

TEST^HUM AN F E N R {jfl D 

PSS8_HUMAN FPNOLH 

Q9UI38 
CG5 1099-03 
TEST_HUMAN 
PSS8_HUMAN 

Q9UI38 nw |jIfiH{|G^lALl|A - - - 

CG31099.03 Qw imBElG^Qk^LnA ■ 

TEST_HUMAN Bq KLMjAQSiOMS qHd - - - 

PSS8_HUMAN llQSKVT E LQ P R V Ivjp QTQ E S QHD S N LC O SH L AF SSAU 



Q9UI38 

C05 1099-03 . - - 

TEST HUMAN - - - 

PSS8 HUMAN S EH 



Information for the ClustalW proteins: 



Accno 
CG5 1099-03 
TEST HUMAN 



Common Name 

novel Serine Protease-like protein 

TESTISIN PRECURSOR (EC 3.4.2 1 
(EOSINOPHIL SERINE PROTEASE 1) (ESP- DE 1). 

PSS8_HUMAN PROSTASIN PRECURSOR (EC 3.4.21.-). 

Q9UI38 TESTES-SPECIFIC PROTEIN TSP50. 



Length 

314 

343 
385 



In the alignment shown above, black outlined amino acid residues indicate residues 
identically conserved between sequences (i.e., residues that may be required to preserve 
structural or functional properties); amino acid residues with a gray background are similar to 
one another between sequences, possessing comparable physical and/or chemical properties 

78 



i l3 1 J ^' H" iH'i' .1"? r- S S 3:rf JL 3 O 



without altering protein structure or function (e.g. the group L, V, I, and M may be considered 
similar); and amino acid residues with a white background are neither conserved nor similar 
between sequences. 



Table 22. PSORT, SignalP and hydropathy results for CuraGen Acc. No* CGS1099- 

5 03. 



outside Certainty^O . 5804 (Affirmative) < suco 

lysosome (lumen) Certainty-0 . 5144 (Affirmative) < euco 

microbody (peroxisome) Certainty=0 . 1203 (Affirmative) < suco 

10 endoplasmic reticulum (membrane) Certainty«»0 . 1000 (Affirmative) < suco 



Is the sequence a signal peptide? 

# Measure Position Value Cutoff Conclusion 
15 max.C 40 0.888 0.37 YES 

max. Y 40 0.848 0.34 YES 
max. S 30 0.975 0.88 YES 
means 1-39 0.708 0.48 YES 

# Most likely cleavage site between pes. 39 and 40: CWG-AG 

20 



Hydropathy Plot plot for CG31099-03 with a window of 19 
£ I 1 1 1 1 1 1 r 



3* 



X 




e 50 100 150 200 S50 300 350 400 

Amino Aoid Number 
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SECP 15 

A SECP 15 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:50) and encoded polypeptide sequence (SEQ ID NO:51) of clone 

CG57051-04 directed toward novel Angiopoietin-like proteins and nucleic acids 
5 encoding them. Figure 20 illustrates the nucleic acid sequence and amino acid sequences 
respectively. This clone includes a nucleotide sequence (SEQ ID NO: 50) of 937 bp. The 
nucleotide sequence includes an open reading frame (ORF) beginning with an ATG initiation 
codon at nucleotides 155-157 and ending with a TAG stop codon at nucleotides 881-883. 
Putative untranslated regions, if any, are found upstream from the initiation codon and 
10 downstream from the termination codon. The encoded protein having 242 amino acid residues is 
presented using the one-letter code in Figure 20. The protein encoded by clone CG57051-04 is 
predicted by the PSORT program to be located at the endoplasmic reticulum with a certainty of 
0.8200, and appears to be a signal protein (see Table 27 below). 

PPARG ANGIOPOIETIN-RELATED PROTEIN - PGAR: 

15 Background 

The peroxisome proliferator-activated receptors (PPARs) are members of the nuclear 
hormone receptor subfamily of transcription factors. PPARs form heterodimers with retinoid X 
receptors (RXRs) and these heterodimers regulate transcription of various genes. There are 3 
known subtypes of PPARs, PPAR-alpha (170998), PPAR-delta (600409), and PPAR-gamma. 
20 PPAR-gamma is believed to be involved in adipocyte differentiation. Tontonoz et al. (1994) 

found 2 isoforms of PPAR-gamma in mouse, gamma- 1 and gamma-2, resulting from the use of 
different initiator methionines. 

Elbrecht et al. (1996) cloned cDNAs of PPAR-gamma- 1 and PPAR-gamma-2 from 
human fat cell cDNA by PGR using primers based on the mouse sequence and on a previously 
25 published human cDNA sequence (Greene et al., 1995). They found that the human PPAR- 
gamma- 1 and PPAR-gamma-2 genes have identical sequences except that PPAR-gamma-2 
contains an additional 84 nucleotides at its 5-prime end. The sequences obtained by Elbrecht et 
al. (1996) differed at 3 sites from the previously published human PPAR-gamma- 1 sequence of 
Greene et al. (1995). By Northern blot analysis, Elbrecht et al. (1996) found that human PPAR- 
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gamma is expressed at high levels in adipocytes and at a much lower level in bone marrow, 
spleen, testis, brain, skeletal muscle, and liver. 

The thiazolidinediones are synthetic compounds that can normalize elevated plasma 
glucose levels in obese, diabetic rodents and may be efficacious therapeutic agents for the 
5 treatment of noninsulin-dependent diabetes mellitus. Lehmann et al. (1995) identified the 
thiazolidinediones as high affinity ligands for mouse PPAR-gamma receptors. Elbrecht et al. 
(1996) confirmed that human PPAR-gamma- 1 and PPAR-gamma-2 have similar activity and 
determined that 3 different thiazolidinedione compounds are agonists of PPAR-gamma- 1 and 
PPAR-gamma-2. Elbrecht et al. (1996) speculated that the antidiabetic activity of the 
10 thiazolidinediones in humans is mediated through the activation of PPAR-gamma- 1 and PPAR- 
gamma-2. 

The nuclear receptor PPARG/RXRA heterodimer regulates glucose and lipid homeostasis 
and is the target for the antidiabetic drugs GI262570 and the thiazolidinediones. Gampe et al. 
(2000) reported the crystal structures of the PPARG and RXRA ligand-binding domains 
15 complexed with the RXRA ligand 9-cis-retinoic acid, the PPARG agonist GI262570, and 

coactivator peptides. The structures provided a molecular understanding of the ability of RXRs 
to heterodimerize with many nuclear receptors and of the permissive activation of the 
PPARG/RXRA heterodimer by 9-cis-retinoic acid. 

Mueller et al. (1998) showed that PPAR-gamma is expressed at significant levels in 
20 human primary and metastatic breast adenocarcinomas. Ligand activation of this receptor in 
cultured breast cancer cells caused extensive lipid accumulation, changes in breast epithelial 
gene expression associated with a more differentiated, less malignant state, and a reduction in 
growth rate and clonogenic capacity of the cells. Inhibition of MAP kinase, a powerful negative 
regulator of PPAR-gamma, improves the thiazohdinedione ligand sensitivity of nonresponsive 
25 cells. These data suggested that the PPAR-gamma transcriptional pathway can induce terminal 
differentiation of malignant breast epithelial cells. 

Tontonoz et al. (1994) identified a novel adipocyte-specific transcription factor, which 
they termed ARF6, and showed that it is a heterodimeric complex of RXRA and PPARG. (This 
ARF6 is not to be confused with ADP-ribosylation factor 6 (600464), with is also symbolized 
30 ARF6.) Tontonoz et al. (1 995) demonstrated that PPAR-gamma-2 regulates adipocyte expression 
of the phosphoenolpyruvate carboxykinase gene (PCKl, 261680; PCK2, 261650). 
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The formation of foam cells from macrophages in the arterial wall is characterized by 
dramatic changes in lipid metabolism, including increased expression of scavenger receptors and 
the uptake of oxidized low density lipoprotein (oxLDL). Tontonoz et al. (1998) demonstrated 
that the nuclear receptor PPAR-gamma is induced in human monocytes following exposure to 
5 oxLDL and is expressed at high levels in the foam cells of atherosclerotic lesions. Ligand 

activation of the PPAR-gamma:RXR-alpha hcterodimer in myelomonocytic cell lines induced 
changes characteristic of monocytic differentiation and promoted uptake of oxLDL through 
transcriptional induction of the scavenger receptor CD36. These results revealed a novel 
signaling pathway controlling differentiation and lipid metabolism in monocytic cells. Tontonoz 
10 et al. (1998) suggested that endogenous PPAR-gamma ligands may be important regulators of 
gene expression during atherogenesis. 

Nagy et al. (1998) demonstrated that oxLDL activates PPAR-gamma-dependent 
transcription through a signaling pathway involving scavenger receptor-mediated particle uptake. 
Moreover, they identified 2 of the major oxidized linoleic acid metabolite components of 
15 oxLDL, 9-HODE and 13-HODE, as endogenous activators and ligands of PPAR-gamma. The 
authors found that the biologic effects of oxLDL are coordinated by 2 sets of receptors, one on 
the cell surface, which binds and internalizes the particle, and one in the nucleus, which is 
transcriptionally activated by its component lipids. Nagy et al. (1998) suggested that PPAR- 
gamma may be a key regulator of foam cell gene expression. 

20 Chawla et al. (2001) provided evidence that in addition to lipid uptake, PPARG regulates 

a pathway of cholesterol efflux. PPARG induces ABCAl (600046) expression and cholesterol 
removal from macrophages through a transcriptional cascade mediated by the nuclear receptor 
LXRA (NR1H3; 602423). Ligand activation of PPARG leads to primary induction of LXRA and 
to coupled induction of ABCAl . Transplantation of PPAR null bone marrow into Ldlr -/- mice 

25 resulted in a significant increase in atherosclerosis, consistent with the hypothesis that regulation 
of LXRA and ABCAl expression is protective in vivo. Chawla et al. (2001) proposed that 
PPARG coordinates a complex physiologic response to oxLDL that involves particle uptake, 
processing, and cholesterol removal through ABCAl . 

Fajas et al. (1997) used competitive RT-PCR to distinguish relative PPARGl and 
30 PPARG2 mRNA levels in tissues. They determined that PPARG2 is much less abundant than 
PPARGl. The highest levels of PPARG are found in adipose tissue and large intestine, with 
intermediate levels in kidney, liver, and small intestine, and barely detectable levels in muscle. 
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Western blot analysis showed that PPARG is expressed as a 60-kD protein. EMSA analysis 
indicated that PPARG2 binds to and transactivates through a peroxisome proliferator response 
element. The PPARG gene contains 9 exons and spans more than 100 kb. Through alternative 
transcription start sites and alternate splicing, the mRNAs differ at their 5-prime ends, with 
5 PPARG 1 being encoded by 8 and PPARG2 by 7 exons. PPARG 1 uses exons Al and A2, 
whereas PPARG2 uses exon B; both use exons 1 through 6. 

Martin et al. (1998) reported that there are 3 PPARG isoforms which differ at their 5- 
prime ends, each under the control of its own promoter. PPARG 1 and PPARG3, however, give 
rise to the same protein, encoded by exons 1 through 6, because neither the Al nor the A2 exon 

10 are translated. By RNase protection analysis, Ricote et al. (1998) showed that in phorbol ester- 
stimulated macrophage cell lines, a probe to PPARG 1 protected a 218-nucleotide fragment of 
PPARGl, but only a 1 74-nucleotide fragment of PPARG3. A PPARG2 probe protected a 
common 104-nucleotide fragment of both PPARGl and PPARG3. PPARG2 itself was not 
expressed in the stimulated macrophages. PPARGl and PPARG2 promoters are primarily used 

15 in adipose tissue. The authors speculated that other inducing factors, such as cytokines MCSF 
(120420) or GMCSF (138960), or oxidized LDL (see OLRl, 602601), might differentially 
regulate expression of the 3 isoforms. 

Lowell (1999) reviewed the role of PPARG in adipogenesis. 

Kersten et al. (2000) reviewed the roles of PPARs in health and disease. 

20 Tong et al. (2000) showed that murine GATA2 (1 37295) and GATA3 (131 320) are 

specifically expressed in white adipocyte precursors and that their downregulation sets the stage 
for terminal differentiation. Constitutive GATA2 and GATA3 expression suppressed adipocyte 
differentiation and trapped cells at the preadipocyte stage. This effect was mediated, at least in 
part, through the direct suppression of PPARG. 

25 Mueller et al. (2000) showed that PPAR-gamma is expressed in human prostate 

adenocarcinomas and cell lines derived from these tumors. Activation of this receptor with 
specific ligands exerts an inhibitory effect on the growth of prostate cancer (176807) cell lines. 
They showed that prostate cancer and cell lines do not have intragenic mutations in the PPARG 
gene, although 40% of the informative tumors have hemizygous deletions of this gene. They 

30 conducted a phase II clinical study in patients with advanced prostate cancer using troglitazone 
(Rezulin), a PPAR-gamma ligand used for the treatment of type II diabetes. Oral treatment was 
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administered to 41 men with histologically confirmed prostate cancer and no symptomatic 
metastatic disease. An unexpectedly high incidence of prolonged stabilization of prostate- 
specific antigen (KLK3; 176820) was seen in patients treated with troglitazone. In addition, 1 
patient had a dramatic decrease in serum prostate-specific antigen to nearly undetectable levels. 
5 The findings suggested that PPAR-gamma may serve as a biologic modifier in human prostate 
cancer and that its therapeutic potential should be further studied. 

By somatic cell hybridization and linkage analysis, Greene et al. (1995) mapped the 
human PPARG gene to 3p25. Beamer et al. (1997) mapped the gene to 3p25 by fluorescence in 
situ hybridization. 

10 Meirhaeghe et al. (1998) detected a polymorphism corresponding to a silent C-to-T 

substitution in exon 6 of the PPARG gene (601487.0009). 

Since PPARG is a transcription factor that has a key role in adipocyte differentiation, 
Ristow et al. (1998) investigated whether mutations of the gene encoding this factor predispose 
people to obesity. They studied 358 unrelated German subjects, including 121 obese subjects, 

15 looking for mutations in the PPARG2 gene at or near a site of serine phosphorylation at position 
1 14 that negatively regulates transcriptional activity of the protein. Four of the 121 obese 
subjects had a missense mutation in the PPARG2 gene that resulted in conversion of proline to 
glutamine at position 115 (601487.0001), as compared with none of the 237 subjects of normal 
weight. All the subjects with the mutant allele were markedly obese. Overexpression of the 

20 mutant gene in murine fibroblasts led to the production of a protein in which the phosphorylation 
of serine at position 1 14 was defective, as well as accelerated differentiation of the cells into 
adipocytes and greater cellular accumulation of triglyceride than with the wildtype PPAR- 
gamma-2. These effects were similar to those of an in vitro mutation created directly at the 
serl 14 phosphorylation site. 

25 PPARG 1 and PPARG2 have ligand-dependent and -independent activation domains. 

PPARG2 has an additional 28 amino acids at the amino terminus that render its ligand- 
independent activation domain 5- to 10-fold more effective than that of PPARG 1. Insulin 
stimulates the ligand-independent activation of PPARG 1 and PPARG2; however, obesity and 
nutritional factors influence only the expression of PPARG2 in human adipocytes. Deeb et al. 

30 (1 998) reported that a relatively common prol 2-to-ala substitution in PPARG2 (60 1487.0002) is 
associated with lower body mass index and improved insulin sensitivity among middle-aged and 
elderly Finns. A significant odds ratio (4.35, P = 0.028) for the association of the pro/pro 

84 



A O LP tfiii* 3 O h:"!! !L „^ lJ» il":" 



genotype with type 2 diabetes was observed among Japanese Americans. The PPARG2 ala allele 
showed decreased binding affinity to the cognate promoter element and reduced ability to 
transactivate responsive promoters. These findings suggested that the PPARG2 prol2-to-ala 
polymorphism may contribute to the observed variability in BMI and insulin sensitivity in the 
5 general population. 

Valve et al. (1999) investigated the ft-equcncies of the prol2-to-ala polymorphism in exon 
B and the silent CAC478-to-CAT polymorphism in exon 6 of the PPARG gene and their effects 
on body weight, body composition, and energy expenditure in obese Finnish patients. The 
frequencies of the ala 12 allele in exon B and the CAT478 allele in exon 6 were not significantly 

10 different between the obese and population-based control subjects (0.14 vs 0.13 and 0.19 vs 0.21, 
respectively). The polymorphisms were associated with increased BMI, and the 5 women with 
both alal2ala and CAT478CAT genotypes were significantly more obese compared with the 
women having both prol2pro and CAC478CAC genotypes, and they had increased fat mass. The 
authors concluded that the prol2-to-ala and CAC478-to-CAT polymorphisms in the PPARG 

1 5 gene are associated with severe overweight and increased fat mass among obese women. 

Sarraf et al. (1999) identified 4 somatic mutations (1 nonsense, 1 frameshifl, and 2 
missense) in the PPARG gene among 55 sporadic colon cancers (1 14500). Each mutation gready 
impaired the function of the PPARG protein. The 472delA mutation (601487.0003) resulted in 
the deletion of the entire ligand binding domain. Q286P (601487.0004) and K319X 
20 (601487.0005) retained a total or partial ligand binding domain but lost the ability to activate 
transcription through a failure to bind to ligands. R288H (601487.0006) showed a normal 
response to synthetic ligands but greatly decreased transcription and binding when exposed to 
natural ligands. These data indicated that colon cancer in humans is associated with loss-of- 
fiinction mutations in the PPARG gene. 

25 Barroso et al. (1999) reported 2 different heterozygous mutations in the ligand-binding 

domain of PPARG in 3 subjects with severe insulin resistance (604367). In the PPAR-gamma 
crystal structure, the mutations destabilized helix 12, which mediates transactivation. Consistent 
with this, both receptor mutants were markedly transcriptionally impaired and, moreover, were 
able to inhibit the action of coexpressed wildtype PPAR-gamma in a dominant-negative manner. 

30 In addition to insulin resistance, all 3 subjects developed type 2 diabetes mellitus and 

hypertension at an unusually early age. Barroso et al. (1999) concluded that their findings 
represented the first germline loss-of-function mutations in PPAR-gamma and provided 
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compelling genetic evidence that this receptor is important in the control of insulin sensitivity, 
glucose homeostasis, and blood pressure in man. 

Kroll ct al. (2000) reported that t(2;3)(ql3;p25), a translocation identified in a subset of 
human thyroid follicular carcinomas, results in fusion of the DNA-binding domains of the 
5 thyroid transcription factor PAX8 (16741 5) to domains A to F of PPARGl . PAX8/PPARG1 
mRNA and protein were detected in 5 of 8 thyroid follicular carcinomas but not in 20 follicular 
adenomas, 10 papillary carcinomas, or 10 multinodular hyperplasias. PAX8/PPARG1 inhibited 
thiazolidinedione-induced transactivation by PPARGl in a dominant-negative manner. The 
experiments demonstrated an oncogenic role for PPARG and suggested that PAX8/PPARG1 
10 may be useful in the diagnosis and treatment of thyroid carcinoma. 

ANIMAL MODEL 

The nuclear hormone receptor PPARG promotes adipogenesis and macrophage 
differentiation and is a primary pharmacologic target in the treatment of type II diabetes. Barak 
et al. (1999) showed that PPARG gene knockout in mice resulted in 2 independent lethal phases. 

15 Initially, PPARG deficiency interfered with terminal differentiation of the trophoblast and 
placental vascularization, leading to severe myocardial thinning and death by El 0.0. 
Supplementing PPARG null embryos with wildtype placentas via aggregation with tetraploid 
embryos corrected the cardiac defect, implicating a previously unrecognized dependence of the 
developing heart on a functional placenta. A tetraploid-rescued mutant surviving to term 

20 exhibited another lethal combination of pathologies, including lipodystrophy and multiple 
hemorrhages. These findings both confirmed and expanded the current known spectrum of 
physiologic functions regulated by PPARG. 

Kubota et al. (1999) generated homozygous PPARG-deficient mouse embryos, which 
died at 10.5 to 1 1.5 days postcoitum due to placental dysfunction. Heterozygous PPARG- 

25 deficient mice were protected from the development of insulin resistance due to adipocyte 
hypertrophy under a high-fat diet. These phenotypes were abrogated by PPARG agonist 
treatment. Heterozygous PPARG-deficient mice showed overexpression and hypersecretion of 
leptin despite the smaller size of adipocytes and decreased fat mass, which may explain these 
phenotypes at least in part. This study revealed an unpredicted role for PPARG in high-fat diet- 

30 induced obesity due to adipocyte hypertrophy and insulin resistance, which requires both alleles 
ofPPARG. 
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Rosen et al. (1999) demonstrated that mice chimeric for wildtype and PPARG null cells 
showed little or no contribution of null cells to adipose tissue, whereas most other organs 
examined did not require PPARG for proper development. In vitro, the differentiation of 
embryonic stem cells into fat was shown to be dependent on PPARG gene dosage. These data 
5 provided direct evidence that PPARG is essential for the formation of fat. 

The thiazolidinedione (TZD) class of insulin-sensitizing, antidiabetic drugs interacts with 
PPAR-gamma. Miles et al. (2000) conducted metabolic studies in PPARG gene knockout mice. 
Because homozygous PPARG-null mice die in development, they studied glucose metabolism in 
mice heterozygous for the mutation. They identified no statistically significant differences in 

10 body weight, basal glucose, insulin, or free fatty acid levels between the wildtype and 

heterozygous groups. Nor was there a difference in glucose excursion between the groups of 
mice during oral glucose tolerance tests. However, insulin concentrations of the wildtype group 
were greater than those of the heterozygous deficient group, and insulin-induced increase in 
glucose disposal rate was significantly increased in the heterozygous mice. Likewise, the insulin- 

15 induced suppression of hepatic glucose production was significantly greater in the heterozygous 
mice than in wildtype mice. Taken together, these results indicated that— counterintuitively— 
although pharmacologic activation of PPAR-gamma improves insulin sensitivity, a similar effect 
is obtained by genetically reducing the expression levels of the receptor. 

ALLELIC VARIANTS (selected examples) 

20 .000 1 OBESITY, SEVERE [PPARG, PRO 1 1 5,GLN] 

In 4 German subjects with severe obesity (601665), Ristow et al. (1998) identified a 
prol 15-to-gln mutation of the PPAR-gamma-2 gene. Significantly, the mutation was in the 
codon immediately adjacent to a serine-114 phosphorylation site. The proll5-to-gln mutation 
occurs in exon 6, which is shared by all 3 forms of PPAR-gamma Wang et al. (1999). 

25 .0002 PPARG2 POLYMORPHISM C/G [PPARG, PRO 1 2 ALA ] 

OBESITY, PROTECTION AGAINST DIABETES MELLITUS, TYPE II, 
SUSCEPTIBILITY TO, INCLUDED Because the product of the PPARG gene is a nuclear 
receptor that regulates adipocyte differentiation and possibly lipid metabolism and insulin 
sensitivity. Yen et al. (1997) screened for mutations in the entire coding region of the PPARG 
30 gene in 26 diabetic Caucasians with or without obesity (601665). They found a CCG (pro)-to- 
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GCG (ala) missense mutation at codon 12 (P12A). The allele frequency of the mutation varied 
from 0.12 in Caucasian Americans to 0.10 in Chinese. Beamer et al. (1998) noted that the amino 
acid position of the P12A mutation is within the domain of PPAR-gamma-2 that enhances 
ligand-independent activation, that the substitution of alanine for proline is nonconservative, and 
5 that this amino acid change might cause a significant alteration in protein structure. To test the 
hypothesis that individuals with the variant are at increased genetic risk for obesity and/or insulin 
resistance, they performed association studies in 2 independently recruited cohorts of unrelated, 
nondiabetic, adult Caucasian subjects. They found that the P12A mutation was associated with 
higher BMI in the 2 cohorts, suggesting that the mutation may contribute to genetic susceptibility 
10 for the multifactorial disorder of obesity. 

Deeb et al. (1998) studied a polymorphism of the PPARG gene, a C-to-G variant that 
created an Hgal restriction site and predicted the substitution of alanine for proline at position 12 
in the PPARG2-specific exon B. In a group of Finnish men and women with a PPARG2 ala 
allele frequency of 0.12, they found that this allele was associated with lower fasting insulin 

15 levels (P = 0.011) and BMI (P = 0.027) and higher insulin sensitivity (P = 0.047). This 

association was independent of sex. The findings were verified by studies in a group of elderly 
subjects. They also studied the association of the prol2-to-ala substitution in PPARG2 with type 
2 diabetes (125853) in a group of second-generation Japanese-American (Nisei) men and women 
that included individuals with type 2 diabetes, impaired glucose tolerance, and normal controls. 

20 The ala allele was less frequent among subjects with type 2 diabetes (0.022) than among normal 
controls (0.092). The odds ratio for association of pro/pro with diabetes was significant (4.35, P 
= 0.028), whereas the frequency of the ala allele among impaired glucose tolerance subjects was 
intermediate (0.039). Deeb et al. (1998) suggested that the lower transactivation capacity of the 
ala variant of PPARG2 underlies the association of this allele with lower BMI and higher insulin 

25 sensitivity. The ala isoform may lead to less efficient stimulation of PPARG target genes and 
predispose to lower levels of adipose tissue mass accumulation, which in turn may be 
responsible for improved insulin sensitivity. 

Altshuler et al. (2000) evaluated 16 published genetic associations to type 2 diabetes and 
related subphenotypes using a family-based design to control for population stratification, and 
30 replication samples to increase power. They confirmed only 1 association, that of the common 
prol2-to-ala polymorphism in PPAR-gamma with type 2 diabetes. By analyzing over 3,000 
individuals, they found a modest (1 .25-fold) but significant (P = 0.002) increase in diabetes risk 
associated with the more common proline allele (approximately 85% firequency). Because the 
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risk allele occurs at such high frequency, its modest effect translates into a large population- 
attributable risk-influencing as much as 25% of type 2 diabetes in the general population. 

.0003 CANCER OF COLON [PPARG, 1-BP DEL, 472A] 

In a sporadic colon cancer (1 14500) tumor, Sarraf et al. (1999) identified a somatic 
5 mutation in the PPARG gene, a 1 -bp deletion at nucleotide 472, which resulted in a framcshift. 

.0004 CANCER OF COLON [PPARG, GLN286PRO] 

In a sporadic colon cancer (1 14500) tumor, Sarraf et al. (1999) identified a somatic 
mutation in the PPARG gene, an A-to-G transition at nucleotide 857, which resulted in a gln286- 
to-pro substitution. 

10 .0005 CANCER OF COLON [PPARG, LYS319TER] 

In a sporadic colon cancer (1 14500), Sarraf et al. (1999) identified a somatic mutation in 
the PPARG gene, an A-to-T transversion at nucleotide 955, which resulted in a lys3 1 9-to-ter 
substitution. 

.0006 CANCER OF COLON [PPARG, ARG288HIS] 

15 In a sporadic colon cancer (1 14500) tumor, Sarraf et al. (1999) identified a somatic 

mutation in the PPARG gene, a G-to-A transition at nucleotide 863, which resulted in an arg288- 
to-his substitution. 

.0007 DIABETES MELLITUS, INSULIN-RESISTANT, WITH ACANTHOSIS 
NIGRICANS AND HYPERTENSION [PPARG, PR0467LEU ] 

20 In a patient with severe insulin resistance, type 2 diabetes mellitus, and hypertension 

(604367) who had been diagnosed in her twenties, Barroso et al. (1999) detected a C-to-T 
transition in the PPARG gene resulting in a proline-to-leucine mutation at codon 467 (P467L). 
Her son, aged 30 years, who also had a history of early-onset diabetes and hypertension, was also 
heterozygous for the P467L mutation. All other family members, including both parents of the 

25 proband, none of whom were known to have diabetes or hypertension, were homozygous for 

wildtype receptor sequence. Nonpaternity was excluded, indicating a de novo appearance of the 
mutation in the proband. 
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.0008 DIABETES MELLITUS, INSULIN-RESISTANT, WITH ACANTHOSIS 
NIGRICANS AND HYPERTENSION [PPARG, VAL290MET ] 

In a 15-year-old patient with primary amenorrhea, hirsutism, acanthosis nigricans, 
elevated blood pressure, and markedly elevated fasting and postprandial insulin levels (604367), 
5 Barroso et al. (1999) identified a G-to-A transition in the PPARG gene resulting in a valine-to- 
methionine mutation at codon 290 (V290M). By age 17 the patient had developed type 2 
diabetes and had hypertension which required treatment with beta-blockers. Her clinically 
unaffected mother and sister were both wildtype at this locus; screening of the deceased father 
was not possible. 

10 .0009 PPARG POLYMORPHISM C-T [PPARG, 161C-T ] 

Meirhaeghe et al. (1998) reported a 161C-T substitution in exon 6 of the PPARG gene. 
Since PPAR-gamma is a transcription factor implicated in adipocyte differentiation and in lipid 
and glucose metabolism, they analyzed the relationships between this genetic polymorphism and 
various markers of the obesity phenotj^e in a representative sample of 820 men and women 

15 living in northern France. The frequencies of the C and T alleles were 0.860 and 0.140, 

respectively. In the whole sample, no association of the polymorphism with the markers tested 
was observed, but a statistically significant interaction (P less than 0.03) existed between this 
polymorphism and body mass index (BMI) for plasma leptin levels. Obese subjects bearing at 
least one T allele had higher plasma leptin levels than subjects who did not. This effect existed in 

20 both genders, despite the higher plasma leptin levels observed in women. Thus, for a given leptin 
level, the BMI was relatively lower in obese subjects carrying at least one T allele than in obese 
CC homozygotes. 

Wang et al. (1999) studied this polymorphism in 647 Australian Caucasian patients aged 
65 years or less, with or without angiographically documented coronary artery disease. The 

25 frequencies of the CC, CT, and TT genotypes were 69.8%, 27.7%, and 2.5%, respectively, and 
the T allele frequency 0.163. These frequencies were in Hardy- Weinberg equilibrium and not 
different between men and women. Wang et al. (1999) found that the T allele carriers (CT and 
TT genotypes) had significantly reduced coronary artery disease risk compared to the CC 
homozygotes, with an odds ratio of 0.457, Association with obesity (601665) was not found in 

30 these patients. The authors interpreted this to indicate that the PPARG gene may have a 

significant role in atherogenesis, independent of obesity and of lipid abnormalities, possibly via a 
direct local vascular wall effect. 
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Using a subtractive cloning strategy to identify downstream targets of peroxisome 
proliferator-activated receptor-gamma (PPARG; 601487), and by screening cDNA libraries, 
Yoon et al. (2000) isolated mouse and human cDNAs encoding PGAR. The 406-amino acid, 60- 
kD human PGAR protein, which shares 75% amino acid identity with the mouse protein, is a 
5 member of the angiopoietin family of secreted proteins and bears highest similarity to 
angiopoietin-2 (ANGPT2; 601922). Like other members of this family, PGAR contains a 
predicted coiled-coil quaternary structure, and the authors hypothesized that PGAR may form 
multimeric or other higher-order structures- PGAR has a secretory signal peptide, 3 potential N- 
glycosylation sites, and 4 cysteines that may be available for intramolecular disulfide bonding. 

10 Northern blot analysis detected a 2-kb PGAR transcript that was highly enriched in white fat and 
placenta. In situ hybridization analysis revealed expression of mouse Pgar at low levels in most 
organs and connective tissue at embryonic day 13.5 (E13.5). Between E15.5 and E18.5, strongest 
expression of Pgar was in brown fat. Northern blot analysis detected elevated levels of Pgar 
expression in mouse models of obesity and diabetes. Alterations in nutrition and leptin (164160) 

15 administration in mice modulated Pgar expression in vivo. Yoon et al. (2000) demonstrated that 
PPARG ligand-induced transcription of PGAR follows a rapid time course typical of immediate- 
early genes and occurs in the absence of protein synthesis. Using a culture model system, they 
observed that induction of the PGAR transcript coincides with hormone-dependent adipocyte 
differentiation. Yoon et al. (2000) concluded that PGAR is a bona fide target of PPARG and may 

20 have a role in regulation of systemic lipid metabolism or glucose homeostasis. 

Kersten et al. (2000) identified mouse Pgar, which they called Fiaf (fasting-induced 
adipose factor), using a subtractive hybridization assay to identify PPARA (170998) target 
genes. Northern blot analysis detected expression of Fiaf in mouse white and brown adipose 
tissue, with weak expression in lung, kidney, and liver. Using a combination of wildtype, Ppara 
25 mutant, and Pparg mutant mice, Kersten et al. (2000) demonstrated that mRNA expression is 
stimulated by PPARA in liver and by PPARG in white adipose tissue. Expression of Fiaf was 
upregulated in liver and white adipose tissue during fasting. Western blot analysis showed that 
the abundance of Fiaf in plasma decreased with high fat feeding, an effect directly opposite that 
observed with leptin. 

30 By radiation hybrid analysis, Yoon et al. (2000) mapped the PGAR gene to 19pl3.3. 

The DNA and protein sequences for the novel Angiopoie tin-like gene are reported here 
as CuraGen Acc. No. CG57051-04. 
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Similarities 

In a search of sequence databases, it was found, for example, that the nucleic acid 
sequence of this invention has 716 of 733 bases (97%) identical to a gbiGENBANK- 
ID:AF202636|acc:AF202636.1 mRNA from Homo sapiens (Homo sapiens angiopoietin-like 
5 protein PPl 158 mRNA, complete cds) (Table 23). The full amino acid sequence of the protein of 
the invention was found to have 181 of 183 amino acid residues (98%) identical to, and 182 of 
183 amino acid residues (99%) similar to, the 406 amino acid residue ptnr:SPTREMBL- 
ACC:Q9HBV4 protein from Homo sapiens (Human) (ANGIOPOIETIN-LIKE PROTEIN 
PPl 158) (Table 24). 

10 A multiple sequence alignment is given in Table 26, with the protein of the invention 

being shown on the first line in a ClustalW analysis comparing the protein of the invention with 
related protein sequences. Please note this sequence represents a splice form of Angiopoietin as 
indicated in positions 184L to 347G and SNPs: Q24R and G25S. 

The presence of identifiable domains in the protein disclosed herein was determined by 
1 5 searches versus domain databases such as Pfam, PROSITE, ProDom, Blocks or Prints and then 
identified by the Interpro domain accession number. Significant domains are summarized below: 

Model Domain seq-f seq-t hmm-f hmm-t score E-value 



fibrinogen_C 1/1 184 236.. 204 272.] 31.7 4.1e-08 

20 

IPR002 181; Fibrinogen_C 

Fibrinogen [1], the principal protein of vertebrate blood clotting is an hexamer containing 
two sets of three different chains (alpha, beta, and gamma), linked to each other by disulfide 
bonds. The N-terminal sections of these three chains are evolutionary related and contain the 
25 cysteines that participate in the cross-linking of the chains. However, there is no similarity 

between the C-terminal part of the alpha chain and that of the beta and gamma chains. The C- 
terminal part of the beta and gamma chains forms a domain of about 270 amino-acid residues. 
As shown in the schematic representation this domain contains four conserved cysteines 
involved in two disulfide bonds. 

30 xxxxCxxxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCxxxxxCxxxxxxxxxxx 

II II 

+ + + + 
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*C': conserved cysteine involved in a disulfide bond. 

Such a domain has been recently found in other proteins which are listed below. 

Two sea cucumber fibrinogen-like proteins (FReP-A and FRcP-B). These are proteins, of 
5 about 260 amino acids, which have a fibrinogen beta/gamma C-terminal domain. 

In the C-terminus of Drosophila protein scabrous (gene sea). Scabrous is involved in the 
regulation of neurogenesis in Drosophila and may encode a lateral inhibitor of R8 cells 
differentiation. In the C-terminus of a mammalian T-cell specific protein of unknown function. 
In the C-terminus of a human protein of unknown function which is encoded on the 
10 opposite strand of the steroid 21-hydroxylase/complement component C4 gene locus. 

The function of this domain is not yet known, but it has been suggested that it could be 
involved in protein-protein interactions. 

This indicates that the sequence of the invention has properties similar to those of other 
proteins known to contain this/these domain(s) and similar to the properties of these domains. 

15 Chromosomal information: 

The Angiopoietin-like gene disclosed in this invention maps to chromosome 19pl3.3. 
This assignment was made using mapping information associated with genomic clones, public 
genes and ESTs sharing sequence identity with the disclosed sequence and CuraGen 
Corporation's Electronic Northern bioinformatic tool. 

20 Tissue expression 

The Angiopoietin-like gene disclosed in this invention is expressed in at least the 
following tissues: Adipose, Heart, Aorta, Coronary Artery, Umbilical Vein, Adrenal 
Gland/Suprarenal gland. Pancreas, Islets of Langerhans, Thyroid, Pineal Gland, Parotid Salivary 
glands, Liver, Small Intestine, Duodenum, Colon, Bone Marrow, Lymph node. Bone, Cartilage, 
25 Synovium/Synovial membrane. Skeletal Muscle, Brain, Thalamus, Pituitary Gland, Amygdala, 
Hippocampus, Spinal Chord, Mammary gland/Breast, Ovary, Placenta, Uterus, Vulva, Prostate, 
Testis, Lung, Kidney, Retina, Skin, Foreskin. Expression information was derived from the 
tissue sources of the sequences that were included in the derivation of the sequence of CuraGen 
Acc. No. CG57051-04. 
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Cellular Localization and Sorting 



The PSORT, Signal? and hydropathy profile for the Angiopoietin-like protein are shown 
in Table 27. Although PSORT suggests that the Angiopoietin-like protein may be localized in 
the cytoplasm, the protein of CuraGen Acc. No. CG57051-04 predicted here is similar to the 
5 Fibrinogen family, some members of which are secreted. Therefore it is likely that this novel 
Angiopoietin-like protein is localized to the same sub-cellular compartment. 

Functional Variants and Homologs 

The novel nucleic acid of the invention encoding a Angiopoietin-like protein includes the 
nucleic acid whose sequence is provided in Figure 20, or a fragment thereof. The invention also 

10 includes a mutant or variant nucleic acid any of whose bases may be changed from the 
corresponding base shown in Fig. 1 while still encoding a protein that maintains its 
Angiopoietin-like activities and physiological functions, or a fragment of such a nucleic acid. 
The invention further includes nucleic acids whose sequences are complementary to the 
sequence of CuraGen Acc. No. CG57051-04, including nucleic acid fragments that are 

1 5 complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of non-limiting example, modified 
bases, and nucleic acids whose sugar phosphate backbones are modified or derivatized. These 
modifications are carried out at least in part to enhance the chemical stability of the modified 

20 nucleic acid, such that they may be used, for example, as antisense binding nucleic acids in 
therapeutic applications in a subject. In the mutant or variant nucleic acids, and their 
complements, up to about 3% of the bases may be so changed. 

The novel protein of the invention includes the Angiopoietin-like protein whose sequence 
is provided in Figure 20. The invention also includes a mutant or variant protein any of whose 
25 residues may be changed from the corresponding residue shown in Figure 20 while still encoding 
a protein that maintains its Angiopoietin-like activities and physiological functions, or a 
functional fragment thereof. In the mutant or variant protein, up to about 2% of the amino acid 
residues may be so changed. 
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Chimeric and Fusion Proteins 

The present invention includes chimeric or fusion proteins of the Angiopoietin-like 
protein, in which the Angiopoietin-like protein of the present invention is joined to a second 
polypeptide or protein that is not substantially homologous to the present novel protein. The 
5 second polypeptide can be fused to either the amino-terminus or carboxyl-terminus of the present 
CG57051-04 polypeptide. In certain embodiments a third nonhomologous polypeptide or protein 
may also be fused to the novel Angiopoietin-like protein such that the second nonhomologous 
polypeptide or protein is joined at the amino terminus, and the third nonhomologous polypeptide 
or protein is joined at the carboxyl terminus, of the CG57051-04 polypeptide. Examples of 
10 nonhomologous sequences that may be incorporated as either a second or third polypeptide or 
protein include glutathione S-transferase, a heterologous signal sequence fused at the amino 
terminus of the Angiopoietin-like protein, an immunoglobulin sequence or domain, a serum 
protein or domain thereof (such as a serum albumin), an antigenic epitope, and a specificity 
motif such as (His)6. 

15 The invention further includes nucleic acids encoding any of the chimeric or fusion 

proteins described in the preceding paragraph. 

Antibodies 

The invention further encompasses antibodies and antibody fragments, such as Fab, 
(Fab)2 or single chain FV constructs, that bind immunospecifically to any of the proteins of the 
20 invention. Also encompassed within the invention are peptides and polypeptides comprising 
sequences having high binding affinity for any of the proteins of the invention, including such 
peptides and polypeptides that are fused to any carrier particle (or biologically expressed on the 
surface of a carrier) such as a bacteriophage particle. 

Uses of the Compositions of the Invention 

25 The protein similarity information, expression pattern, cellular localization, and map 

location for the protein and nucleic acid disclosed herein suggest that this Angiopoietin-like 
protein may have important structural and/or physiological functions characteristic of the 
Fibrinogen family. Therefore, the nucleic acids and proteins of the invention are useful in 
potential diagnostic and therapeutic applications and as a research tool. These include serving as 

30 a specific or selective nucleic acid or protein diagnostic and/or prognostic marker, wherein the 
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presence or amount of the nucleic acid or the protein are to be assessed. These also include 
potential therapeutic applications such as the following: (i) a protein therapeutic, (ii) a small 
molecule drug target, (iii) an antibody target (therapeutic, diagnostic, drug targeting/cytotoxic 
antibody), (iv) a nucleic acid useful in gene therapy (gene delivery/gene ablation), (v) an agent 
5 promoting tissue regeneration in vitro and in vivo, and (vi) a biological defense weapon. 

The nucleic acids and proteins of the invention have applications in the diagnosis and/or 
treatment of various diseases and disorders. For example, the compositions of the present 
invention will have efficacy for the treatment of patients suffering from: type II diabetes, obesity, 
colon cancer, diabetes mellitus, insulin-resistant, with acanthosis nigricans and hypertension, 3- 
10 methylglutaconicaciduria, type III; Cone-rod retinal dystrophy-2;DNA ligase I deficiency; 

Glutaricaciduria, type IIB Liposarcoma; Myotonic dystrophy as well as other diseases, disorders 
and conditions. 

These materials are further useful in the generation of antibodies that bind 
immunospecifically to the novel substances of the invention for use in diagnostic and/or 
15 therapeutic methods. 



Table 23. BLASTN search using CuraGen Acc. No. CG57051-04. 

>gb:GENBANK-ID:AF202636|acc:AF202636. 1 Homo sapiens angiopoietin-likc protein 
20 PPn58 mRNA, complete cds - Homo sapiens, 1943 bp. 

Length = 1943 (seq id no:79) 

Plus Strand HSPs: 

25 Score = 3468 (520.3 bits). Expect = 7.8e-202, Sum P(2) - 7.8e-202 

Identities - 716/733 (97%), Positives = 716/733 (97%), Strand - Plus / Plus 

Query: 2 GCGGATCCTCACACGACTGTGATCCGATTCTTTCCAGCGGCTTCTGCAACCAAGCGGGTC 61 

IIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIMIIIMMIIIMI 

30 SbjCt: 20 GCGGATCCTCACACGACTGTGATCCGATTCTTTCCAGCGGCTTCTGCAACCAAGCGGGTC 79 

Query: 62 TTACCCCCGGTCCTCCGCGTCTCCAGTCCTCGCACCTGGAACCCCAACGTCCCCGAGAGT 121 

llllllllllllllllllllllllilllllllllllllllMIIMIIIMIIIIIIIII 

Sbjct: 80 TTACCCCCGGTCCTCCGCGTCTCCAGTCCTCGCACCTGGAACCCCAACGTCCCCGAGAGT 139 

Query: 122 CCCCGAATCCCCGCTCCCAGGCTACCTAAGAGGATGAGCGGTGCTCCGACGGCCGGGGCA 181 

IIIIIMIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIMIIIIIIIIIIIMIM 

Sbjct: 140 CCCCGAATCCCCGCTCCCAGGCTACCTAAGAGGATGAGCGGTGCTCCGACGGCCGGGGCA 199 



35 



40 Query: 182 GCCCTGATGCTCTGCGCCGCCACCGCCGTGCTACTGAGCGCT-AGATCTGGACCCGTGCA 240 

IIIIIIIIIIIIIIIIIIIIMIIIIIIIIMIIIIIIIIII II MMIIIIIIII 

Sbjct: 200 GCCCTGATGCTCTGCGCCGCCACCGCCGTGCTACTGAGCGCTCAGGGC-GGACCCGTGCA 258 
Query: 241 GTCCAAGTCGCCGCGCTTTGCGTCCTGGGACGAGATGAATGTCCTGGCGCACGGACTCCT 300 

45 I I I II I II I I I I 1 1 I II 1 1 I 1 1 I 1 1 1 I II I 1 1 I II II I I 1 1 1 1 I I I I 1 1 I I I I I 1 1 I I I I 

Sbjct: 259 GTCCAAGTCGCCGCGCTTTGCGTCCTGGGACGAGATGAATGTCCTGGCGCACGGACTCCT 318 

96 



10 



15 



20 



25 



30 



35 



Query : 

Sbjct : 

Query: 

Sbjct: 

Query: 
Sbjct: 

Query : 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct : 
Query: 
Sbjct: 



3 01 GCAGCTCGGCCAGGGGCTGCGCGAACACGCGGAGCGCACCCGCAGTCAGCTGAGCGCGCT 360 

IIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIil 

319 GCAGCTCGGCCAGGGGCTGCGCGAACACGCGGAGCGCACCCGCAGTCAGCTGAGCGCGCT 378 

361 GGAGCGGCGCCTGAGCGCGTGCGGGTCCGCCTGTCAGGGAACCGAGGGGTCCACCGACCT 420 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
379 GGAGCGGCGCCTGAGCGCGTGCGGGTCCGCCTGTCAGGGAACCGAGGGGTCCACCGACCT 438 

421 CCCGTTA6CCCCTGAGAGCCGGGTGGACCCTGAGGTCCTTCACAGCCTGCAGACACAACT 480 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
439 CCCGTTAGCCCCTGAGAGCCGGGTGGACCCTGAGGTCCTTCACAGCCTGCAGACACAACT 498 

481 CAAGGCTCAGAACAGCAGGATCCAGCAACTCTTCCACAAGGTGGCCCAGCAGCAGCGGCA 540 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
499 CAAGGCTCAGAACAGCAGGATCCAGCAACTCTTCCACAAGGTGGCCCAGCAGCAGCGGCA 558 

541 CCTGGAGAAGCAGCACCTGCGAATTCAGCATCTGCAAAGCCAGTTTGGCCTCCTGGACCA 6 00 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMII 

559 CCTGGAGAAGCAGCACCTGCGAATTCAGCATCTGCAAAGCCAGTTTGGCCTCCTGGACCA 618 

601 CAAGCACCTAGACCATGAGGTGGCCAAGCCTGCCCGT^GAAAGAGGCTGCCCGAGATGGC 660 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIII 

619 C7UVGCACCTAGACCATGAGGTGGCCAAGCCTGCCCGAAGAAAGAGGCTGCCCGAGATGGC 678 

661 CCAGCCAGTTGACCCGGCTCACAATGTCAGCCGCCTGCACCGAG - GCTGGTGGTTTGGCA 719 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMII II II III II 

679 CCAGCCAGTTGACCCGGCTCACAATGTCAGCCGCCTGCACCGGCTGCCCAGGGATTGCCA 738 



720 



734 



CCTGCAGCCATTCCA 

I III Mill 

739 G- -G-AGCTGTTCCA 750 



Score = 1182 (177.3 bits), Expect = 7.8e-202, Sum P(2) = 7.8e>202 
Identities = 242/245 (98%), Positives = 242/245 (98%), Strand = Plus / Pius 



40 



45 



50 



55 



Query: 
Sbjct: 
Query: 
Sbjct : 
Query : 
Sbjct : 
Query: 
Sbjct: 
Query: 

Sbjct : 



693 



GCCTGCACCG- AGGCTGGTGGTTTGGCACCTGCAGCCATTCCAACCTCAACGGCCAGTAC 7 51 

Mil I I I IIIIIIIIIIIIIIIIIIMIIIIIIIIIIMIIIIIIIIIIIIIIIII 

1203 GCCT-CTCTGGAGGCTGGTGGTTTGGCACCTGCAGCCATTCCAACCTCAACGGCCAGTAC 1261 



752 TTCCGCTCCATCCCACAGCAGCGGCAGAAGCTTAAGAAGGGAATCTTCTGGAAGACCTGG 811 

MIIMIIMIIIIIIIMMIIIIIIIMIMIIIIIIIIIIIMMIIIIIIIIIIII 

12 62 TTCCGCTCCATCCCACAGCAGCGGCAGAAGCTTAAGAAGGGAATCTTCTGGAAGACCTGG 1321 
812 CGGGGCCGCTACTACCCGCTGCAGGCCACCACCATGTTGATCCAGCCCATGGCAGCAGAG 871 

MM II MM MM II MM II II MM II MM MM MM MM II II MM llllll 

1322 CGGGGCCGCTACTACCCGCTGCAGGCCACCACCATGTTGATCCAGCCCATGGCAGCAGAG 1381 
872 GCAGCCTCCTAGCGTCCTGGCTGGGCCTGGTCCCAGGCCCACGAAAGACGGTGACTCTTG 931 

IMIIIIIIIIIIIIIIIIIIIIIMIIIIMI IIIIIIIIIIIIIIIIIIIIMIIIII 

1382 GCAGCCTCCTAGCGTCCTGGCTGGGCCTGGTCCCAGGCCCACGAAAGACGGTGACTCTTG 1441 

932 GCTCTG 937 

llllll 
1442 GCTCTG 1447 



Table 24. BLASTP search using the protein of CuraGen Acc. No. CG57051-04. 



60 >ptnr:SPTREMBL-ACC:Q9HBV4 ANGIOPOIETIN-LIKE PROTEIN PPl 158 - Homo sapiens 
(Human), 406 aa. (seq id no:80) 
Length = 406 

Score - 929 (327.0 bits), Expect = 4.4c- 126, Sum P(2) = 4.4e-126 
65 Identities = 181/183 (98%), Positives = 1 82/1 83 (99%) 



97 
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15 
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30 



Query: 


1 


Sb j Ct : 


1 


Query: 


61 


Sbjct : 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 


Score 


= 333 



MSGAPTAGAALMLCAATAVLLSARSGPVQSKSPRFASWDEMNVLAHGLLQLGQGLREHAE 6 0 

IIIIIIIIIIIIIIIIIIIMIk lllllllllllllllllllllllllllllllllll 

MSGAPTAGAALMLCAATAVLLSAQGGPVQSKSPRFASWDEMNVLAHGLLQLGQGLREHAE 60 

RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRIQQLF 120 

IIIMIIIIIIIIIIMIMIMIIIIIIIIMIIIIIIIIMIIIIIIIIIIIIIIIII 

RTRSQLSALERRLSACGS ACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRIQQLF 120 
HKVAQQQRHLEKQHLRIQHLQSQFGLLDHKHLDHEVAKPARRKRLPEMAQPVDPAHNVSR 180 

IIIIIIMIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

HKVAQQQRHLEKQHLRIQHIiQSQFGLIiDHKHLDHEVAKPARRKRLPEMAQPVDPAHNVSR 180 



II 



Identities 



(117.2 bits). Expect = 4.4e-126, Sum P(2) 
. 60/62 (96%), Positives = 60/62 (96%) 



4.4e-126 



240 



Query : 181 LHRGWWFGTCSHSNLNGQYFRSIPQQRQKLKKGIFWKTWRGRYYPLQATTMLIQPMAAEA 

I illlllMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIII 

345 liSGGWWFGTCSHSNLNGQYFRSIPQQRQKLKKGIFWKTWRGRYYPLQATTMLIQPMAAEA 404 



Sbjct: 
Query: 
Sbjct: 



241 AS 242 
II 

405 AS 406 



Score = 49 (17.2 bits). Expect = 2.4e-33, Sum P(2) = 2.4c-33 
Identities = 14/40 (35%), Positives = 20/40 (50%) 

Query: 1 MSGAPTAGAALMLCAATAVLLSARSGPVQSKSPRFASWDE 40 

+ I II ^1 II I II + I I h*ll* 

Sbjct: 293 LGGEDTA-YSLQLTAPVAGQLGATTVPPSGLSVPFSTWDQ 331 



35 

Table 25. BLASTN Identity search of CuraGen Corporation's Human SeqCalling database using 
CuraGen Ace. No. CG57051-04. 

>s3aq:230527544 , 2394 bp. (seq id nO:81) 
40 Length = 2394 

Minus Strand HSPs: 

Score = 3468 (520.3 bits). Expect - 1 .2c-202, Sum P(2) - 1 .2c-202 
45 Identities = 716/733 (97%), Positives = 716/733 (97%), Strand = Minus / Plus 



50 



55 



60 



65 



Query : 


734 


Sbjct: 


1645 


Query : 


675 


Sbjct: 


1702 


Query: 


615 


Sbjct : 


1762 


Query: 


555 


Sbjct : 


1822 


Query: 


495 


Sbjct : 


1882 



Mill III I. 



Ill II II I MIIIIIIIIIIIIIIIIIMIIMI 



IIIIIMIIIIIIIIIIIIIIIIIIIIIIIUIIIIIIIIIIIMIIIIIIIIIIIII 



1 1 



1 1 1 



IIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIM 



556 



IIIIIIIIIIMIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMI 

TGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGTGGAAGAGTTGCTGGATCCTG 1881 



IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIII 



mill 



98 
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Query : 

Sbjct : 

Query: 
Sb j Ct : 
Query: 
Sbj ct : 
Query: 
Sbjct: 
Query: 
Sbj ct : 
Query: 

Sbj ct : 

Query: 
Sbjct: 
Query: 



43 5 TCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCTGACAGGCGGACCCGCACGCG 376 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
194 2 TCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCTGACAGGCGGACCCGCACGCG 2001 

375 CTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGCGGGTGCGCTCCGCGTGTTCGCGCAGC 316 

MIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIII 

2002 CTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGCGGGTGCGCTCCGCGTGTTCGCGCAGC 2061 

315 CCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCATCTCGTCCCAGGACGCAAAG 256 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
2062 CCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCATCTCGTCCCAGGACGCAAAG 2121 

255 CGCGGCGACTTGGACTGCACGGGTCCAGATCT-AGCGCTCAGTAGCACGGCGGTGGCGGC 197 

llllllllllllllllllllllllll I II lllllllllllllllllllllllllll 
2122 CGCGGCGACTTGGACTGCACGGGTCC - GCCCTGAGCGCTCAGTAGCACGGCGGTGGCGGC 2180 

196 GCAGAGCATCAGGGCTGCCCCGGCCGTCGGAGCACCGCTCATCCTCTTAGGTAGCCTGGG 137 

llllllllllllllllllllll llllllllllllllllllllllllllllllll Mill I 

2181 GCAGAGCATCAGGGCTGCCCCGGCCGTCGGAGCACCGCTCATCCTCTTAGGTAGCCTGGG 2240 

136 AGCGGGGATTCGGGGACTCTCGGGGACGTTGGGGTTCCAGGTGCGAGGACTGGAGACGCG 77 

lllllllll lllllllllllllllllll MM MM I llllllllllllllllllllll! 

2241 AGCGGGGATTCGGGGACTCTCGGGGACGTTGGGGTTCCAGGTGCGAGGACTGGAGACGCG 2300 

76 GAGGACCGGGGGTAAGACCCGCTTGGTTGCAGAAGCCGCTGGAAAGAATCGGATCACAGT 17 

IIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIMIIMIIIIIIIIIIIIIIIMIIII 

2301 GAGGACCGGGGGTAAGACCCGCTTGGTTGCAGAAGCCGCTGGAAAGAATCGGATCACAGT 2360 



16 



Sbjct: 2361 



CGTGTGAGGATCCGC 2 

MIIIIIIIIIIMI 

CGTGTGAGGATCCGC 2375 



Score = 1 182 (177.3 bits), Expect = 1.2c-202, Sum P(2) - K2e-202 
35 Identities - 242/245 (98%), Positives = 242/245 (98%), Strand = Minus / Plus 



40 



45 



50 



55 



Query: 
Sbjct: 
Query: 
Sbjct : 
Query : 
Sbjct : 
Query: 
Sbjct : 
Query : 
Sbjct : 



93 7 CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 878 

IMMMMIIIIMIIIIIIIIIIIIIIIIIIIIIIMIIIIIIMIIIIIIIIIIIII 

94 8 CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 1007 
877 GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 818 

IIIIIIIMMIIMIMIIIIIMIIMMMIIIIIIIIIIIIIIIIIIIIIIIIMI 

1008 GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 1067 
817 GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 758 

MM lllllllll MM MM ill! II MIMM lllllllll II MM Mill IIMII 

1068 GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 112 7 
757 GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCG - GT 699 

MMMMMMMMMMMMIMMMMMMMMIMMMMMMM I 

1128 GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCCAGA 1187 
698 GCAGGC 693 

I MM 

1188 G-AGGC 1192 



60 



>s3aq: 2 18296061 , 1862 bp. (SEQ id no:82) 
Length = 1862 

Minus Strand HSPs: 



65 



Score = 3444 (5 16.7 bits), Expect = 1.8e-201, Sum P(2) = 1.8e-201 
Identities - 714/733 (97%), Positives = 714/733 (97%), Strand = Minus / Plus 

Query: 734 TGGAATGGCTGCAGGTGCCAAACCACCAGCCTC-GGTGCAGGCGGCTGACATTGTGAGCC 676 

Mill III I II III M M I IIMIIIIMMIIIIIIIIIIMM 

Sbjct: 1133 TGGAACAGCTCCTGG CAATCCCTGGGCAGCCGGTGCAGGCGGCTGACATTGTGAGCC 1189 



99 
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40 



45 



50 



55 



60 



65 



Query: 


675 


Sb j Ct : 


1190 


Query: 


615 


Sbjct : 


1250 


Query : 


555 


Sbj ct : 


1310 


Query: 


495 


Sbj ct : 


1370 


Query: 


435 


Sbjct: 


1430 


Query: 


375 


Sbjct: 


1490 


Query: 


315 


Sbjct: 


1549 


Query : 


255 


Sbjct: 


1609 


Query: 


196 


Sbjct: 


1668 


Query: 


136 


Sbj ct : 


1728 


Query: 


76 


Sbjct: 


1788 


Query: 


16 


Sbjct: 


1848 


Score = 1 


182(1 


Identities 


-242/ 


Query: 


937 


Sbjct : 


436 


Query : 


877 


Sbj ct : 


496 


Query: 


817 


Sbjct : 


556 


Query: 


757 


Sbjct: 


616 


Query: 


698 



GGGTCAACTGGCTGGGCCATCTCGGGCAGCCTCTTTCTTCGGGCAGGCTTGGCCACCTCA 616 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIII 

GGGTCAACTGGCTGGGCCATCTCGGGCAGCCTCTTTCTTCGGGCAGGCTTGGCCACCTCA 1249 

TGGTCTAGGTGCTTGTGGTCCAGGAGGCCAAACTGGCTTTGCAGATGCTGAATTCGCAGG 556 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
TGGTCTAGGTGCTTGTGGTCCAGGAGGCCAAACTGGCTTTGCAGATGCTGAATTCGCAGG 1309 



TGCTGCTTCTCCAGGTGCC6CTGCTGCTGGGCCACCTTGTGGAAGAGTTGCTGGATCCTG 

IIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

TGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGTGGAAGAGTTGCTGGATCCTG 

CTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGACCTCAGGGTCCACCCGGCTC 

llllllilllllllllllllllllllllllllllllllllll lllllllllllllllll 
CTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGACCCCAGGGTCCACCCGGCTC 



496 



1369 



436 



1429 



TCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCTGACAGGCGGACCCGCACGCG 376 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIMIIMMillllM 

TCAGGGGCT7VACGGGAGGTCGGTGGACCCCTCGGTTCCCTGACAGGCGGACCCGCACGCG 1489 

CTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGCGGGTGCGCTCCGCGTGTTCGCGCAGC 316 

lllllllll IIIIIIIIIIIIIIIIMIIIIIIIMIIIIIIIIIIIMIIIIIIIIII 

CTCAGGCGC-GCTCCAGCGCGCTCAGCTGACTGCGGGTGCGCTCCGCGTGTTCGCGCAGC 1548 

CCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCATCTCGTCCCAGGACGCAAAG 256 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
CCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCATCTCGTCCCAGGACGCAAAG 1608 

CGCGGCGACTTGGACTGCACGGGTCCAGATCT - AGCGCTCAGTAGCACGGCGGTGGCGGC 197 

llllllllllllllllllllllllll I II lllllllllllllllllllllllllll 
CGCGGCGACTTGGACTGCACGGGTCC - GCCCTGAGCGCTCAGTAGCACGGCGGTGGCGGC 1667 

GCAGAGCATCAGGGCTGCCCCGGCCGTCGGAGCACCGCTCATCCTCTTAGGTAGCCTGGG 137 

IIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIMIIIIIIIIIIIIlllllllllllll I 
GCAGAGCATCAGGGCTGCCCCGGCCGTCGGAGCACCGCTCATCCTCTTAGGTAGCCTGGG 1727 

AGCGGGGATTCGGGGACTCTCGGGGACGTTGGGGTTCCAGGTGCGAGGACTGGAGACGCG 7 7 

lllllll III I lllllllll I III I II lllllllllllllllll II Ml lllllllll I I 
AGCGGGGATTCGGGGACTCTCGGGGACGTTGGGGTTCCAGGTGCGAGGACTGGAGACGCG 1787 

GAGGACCGGGGGTAAGACCCGCTTGGTTGCAGAAGCCGCTGGAAAGAATCGGATCACAGT 1 7 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
GAGGACCGGGGGTAAGACCCGCTTGGTTGCAGAAGCCGCTGGAAAGAATCGGATCACAGT 1847 

CGTGTGAGGATCCGC 2 

lllllllllllllll 
CGTGTGAGGATCCGC 1862 



CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 878 

IIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIMIIIIIIIIM 

CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 495 
GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 818 

MIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIMIIIIIIII 

GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 555 
GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 758 

IIIIIIIIIIMIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 615 

GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCG - GT 699 

llllllllllllllllllllllllllllllllllllllllllllllllllllllll I 
GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCCAGA 675 



698 GCAGGC 693 



100 



1. s J «J y «+lb & n O H-:!! ;a 3 U 



I MM 

Sbjct: 676 G-AGGC 680 

5 >s3aq:2 17940431 Category E: , 530 bp. (Seq id nO:83) 
Ungth = 530 

Minus Strand HSPs: 

10 Score = 1 800 (270. 1 bits). Expect - 1 .2c-75, P = 1 .2e-75 

Identities = 384/403 (95%), Positives = 384/403 (95%), Strand = Minus / Plus 

AGGCTTGGCCACC - TCATGGTCTAGGTG - CTT - GTGGTCCAG - GAGGCCAAACTGGCTTT 576 

II I III I II III I II I III II II III lllllll llllllll 

15 Sbjct: 128 AGCCCTGGTCCCCGTCA-G-TCAATGTGACTGAGTCCGCCATTGAGGCCAGTCTGGCTTT 185 



20 



40 



65 



Query : 


631 


Sbjct : 


128 


Query: 


575 


Sbjct: 


186 


Query: 


515 


Sbjct: 


246 


Query: 


455 


Sbjct: 


306 


Query: 


395 


Sb j Ct : 


366 


Query: 


335 


Sb j ct : 


426 


Query: 


275 


Sbjct: 


486 



GCAGATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGT 516 

MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

GCAGATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGT 245 
GGAAGAGTTGCTGGATCCTGCTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGA 456 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

GGAAGAGTTGCTGGATCCTGCTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGA 3 05 



25 Query: 455 CCTCAGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCT 3 96 

MMMMMMMMIMMIMMMM MMMMMMMMMMMMIMIM 

CCTCAGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCT 365 
GACAGGCGGACCCGCACGCGCTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGCGGGTGC 336 

30 ^ ' MMMMMMMMMMMMMMMI I MMMMMMMMMMMMM 

GACAGGCGGACCCGCACGCGCTCAGGCGCCGTTTCAGCGCGCTCAGCTGACTGCGGGTGC 425 
GCTCCGCGTGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCA 276 

MIMMMMMMMMMMMMMMMMMIMMMMMMMMMMM 

35 Sbjct: 426 GCTCCGCX3TGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCA 485 
TCTCGTCCCAGGACGC7LAAGCGCGGCGACTTGGACTGCACGGGTC 231 

MMMMMIMMMMMMMMMMMMMMMMM 

TCTCGTCCCAGGACGCAAAGCGCGGCGACTTGGACTGCACGGGTC 530 



>s3aq:230121563 , 788 bp. (SEQ ID NO:84) 
Length = 788 



45 Minus Strand HSPs: 

Score = 1 182 (177.3 bits), Expect = 6.4e-48, P = 6.4c-48 

Identities = 242/245 (98%), Positives = 242/245 (98%), Strand - Minus / Plus 

50 Query: 937 CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 878 

MMMMIMMMMMMMMMMMMMMMMMMMMMMMMMI 

CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 230 
GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 818 

55 ~ ~ M M M M M M M M M M M M M M M I M M M M M M M M M M M I M M M 

GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 2 90 
GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 758 

IMMMMMMMMMMMMMMIMMMMMMMMMMMMMMM 

60 Sbjct: 291 GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 3 50 
GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCTIAACCACCAGCCTCG - GT 699 

MMMMMMMMMMMMIMMMMMMMMMMMMMMIM I 

GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCCAGA 410 



Query: 


937 


Sbjct: 


171 


Query: 


877 


Sbjct: 


231 


Query: 


817 


Sbjct: 


291 


Query: 


757 


Sbjct : 


351 


Query: 


698 


Sb j ct : 


411 



GCAGGC 693 
I Mil 



101 



J. s J y ft^ lii^ i-J Jl 3 D 



5 



10 



30 



>s3aq:217939973 , 631 bp. (SEQ ID N0:85) 
Length = 631 

Minus Strand HSPs: 



Score = 1 1 82 (1 77.3 bits). Expect = 8.0e-48, P = 8.0c-48 
Identities = 242/245 (98%), Positives = 242/245 (98%), Strand = Minus / Plus 

CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 878 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 164 

15 Query: 877 GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 818 

llllllllllllllllllllilllllllllllllllllllllllllllllllllllllll 
GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 224 

GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 758 

20 ~ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 284 

GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCG - GT 699 

llllllllllllllllllllllllllllllllllllllllllllllllllllllll I 
25 Sbjct: 285 GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCCAGA 344 



Query: 


937 


Sbjct: 


105 


Query: 


877 


Sbjct: 


165 


Query: 


817 


Sbjct: 


225 


Query: 


757 


Sb j Ct : 


285 


Query: 


698 


Sbjct: 


345 



III 



>s3aq: 217939964 , 328 bp. (SEQ ID NO:86) 

Length = 328 



35 Plus Strand HSPs: 

Score = 777 (1 16.6 bits), Expect = 3.0e-29, P - 3.0e-29 

Identities = 157/159 (98%), Positives = 157/159 (98%), Strand = Plus / Plus 

40 Query: 779 AAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCTACTACCCGCTGCAGGCC 838 

IIIIIIIIIMIIIIIIIIIIIII llllllllllllllllllllllllllllllllllll 
Sbjct: 1 AAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCTACTACCCGCTGCAGGCC 60 

Que ry : 83 9 ACCACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAGCGTCCTGGCTGGGCC 898 

45 I I I I II II II I II II I I I II I II I II II II II I II I I I I I I II II II II I I II I II I I I I 

Sbjct: 61 ACCACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAGCGTCCTGGCTGGGCC 120 

Query: 899 TGGTCCCAGGCCCACGAAAGACGGTGACTCTTGGCTCTG 937 

llllllllllll llllllllllllllllllllllll I 
50 Sbjct: 121 TGGTCCCAGGCCAACGAAAGACGGTGACTCTTGGCTCCG 159 



Table 26. ClustalW alignment of CG57051-04 protein with related proteins. 
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CG57051-04 
CG57051-02 
Q9HBV4 
CG57051-03 

CG57051-04 
CG57051-02 
Q9HBV4 
CG57051-03 

CG57051-04 
CG57051-02 
Q9HBV4 
CG57051-03 

CG57051-04 
CG57051-02 
Q9HBV4 
CG57051-03 



vISGAPTA GAALMLCAATAVLLSAjgCTPVQSKSPR: 

visGAPTA gaalmlcaatavllsa^gpvqskspr: 
visGAPTA gaalmlcaatavllsasBjpvqskspr: 

'.'ISGAPTA GAALMLCAATAVLLSaSSsPVQSICSPR 



SWDEMNVLAHG 
SWDEMNVLAHG 
SWDEMNVLAHG 
S^TOEMHVLAHG 




RTRSQL S ALERRLSACGSACQGTEGSTDLPLAPESRVDPE VLH S LQTQLKAQN S R I QQL F 
RTRSQL S ALERRLSACGSACQGTEGSTDLPLAPESRVDPE VLHSLQTQLKAQNSRIQQLF 
RTRSQL S ALERRLSACGSACQGTEGSTDLPLAPESRVDPE VLHSLQTQLKAQNSRIQQLF 
RTRSOL S ALERRLSACGSACQGTEGSTDLPLAPESRVDPE VLHSLQTQLKAQNSRIQQLF 



HKVAQQQ RHLEKQHL RIQHLQSQFGLLDHICHLDHEVAKPA RRKRLPEMAQP VDP AHN V S R 
HKVAQQQ RHLEKQHL RIQHLQSQFGLLDHICHLDHEVAKPA RRKRLPEMAQP VDP AHN V S R 
HKVAQQQ RHLEKQHL RIQHLQSQFGLLDHICHLDHEVAKPA RRKRLPEMAQP VDP AHN V S R 
H KVAQQO RHLE KQHL R I QHLQSQFGLLDH ICHLDHE VAKP A RRKRLPEMAQPVDPAHN VS R 



-PRO CQELFQVGERQSGLFEIQPQGSPPFLVNCKMTS D 
- - H 



JGWTVIQRRHDGSiJ2DFNRP 

:jgwtviqrrhdgsvdfnrp 

rTG^ATTVIQRRHDGSVDFNRP 



CG57051-04 
CG57051-02 
Q9HBV4 

CG57051-03 BJ|a 



AYICAGFGDPHGEFWLGLEKVHS I TGD RNS RL AVQL RDWDGN A ELLQ FS VHLGGEDTAY 
WEAYKAGFGDPHGEFWLGLEKVHS ITGD RNS RLAVQL RDWDGNA ELLQFS VHLGGEDTAY 
AYKAG FGDPHGEFWLGLEKVHS I TGD RNS RLAVQL RDWD jSw A ELL Q FS VHL GGEDT AY 



CG57051-04 
CG57051-02 
Q9HBV4 
CG57051-03 

CG57051-04 
CG57051-02 
Q9HBV4 
CG57051-03 

CG57051-04 
CG57051-02 
Q9HBV4 
CG57051.03 



SLQLTAP VAGQLGATTVPPSGLSVPFSTV/DQDHDLRRDKN CAKS L 
S LQLTA P V AGQL GATT VPP S GLS VP FS TWDQDHDLRRD KN C AK S L 
S LQLTA P V AGQL GATT VPP S GLS VP FgfrWDQDHDLRR D KN C AK S L 



S AP S VAQRPDHVP S P 



ltpaS 



SWWFGTC S HS N LNGQYFRS I PQQ RQKLKKG I FWKT WRG R YYP L QATTML I QPMAA 
GVsAVFGTC S HS N LNGQYFRS I PQQ RQKLICKG I FWKT WRG R YYP L QATTML I QPMAA 
S GGWWFGTC S HS N LNGQYFRS I PQQ RQKLKKG I FWKT WRG R YYP L QATTML I QPMAA 
S GGWWFGTC S HS N LNGQYFRS I PQQ RQICLKKG I FWKT WRG R YYP L QATTML I QPMAA 



Information for the ClustalW proteins: 

Accno Common Name Length 

CG5705 1 -04 novel Angiopoietin-like protein 242 

CG5705 1 -02 Angiopoietin Related protein / PPAR-gamma 3 86 

Q9HB V4 ANGIOPOIETIN-LIKE PROTEIN PP 1 1 58. 406 

CG5705 1 -03 Angiopoietin-like protein- isoform 3 368 



In the alignment shown above, black outlined amino acid residues indicate residues 
identically conserved between sequences (i.e., residues that may be required to preserve 
structural or functional properties); amino acid residues with a gray background are similar to 
one another between sequences, possessing comparable physical and/or chemical properties 
without altering protein structure or function (e.g. the group L,V, I, and M may be considered 
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similar); and amino acid residues with a white background are neither conserved nor similar 
between sequences. 



Table 27. PSORT, Signal? and hydropathy results for CuraGen Acc. No. CG57051- 



10 



15 



20 



04. 



endoplasmic reticulum (membrane) — Ccrtainty=0.8200( Affirmative) < succ> 
plasma membrane — Certainty=0.1900( Affirmative) < succ> 
microbody (peroxisome) — Certainty=0. 1701 (Affirmative) < sucO 
endoplasmic reticulum (lumen) — Certainty=0. lOOO(Afrirmativc) < sucO 

INTEGRAL Likelihood = -4.04 Transmembrane 7 - 23 ( 4 - 25) 

Seems to be a Type lb (Nexo Ccyt) membrane protein 
Is the sequence a signal peptide? 

# Measure Position Value Cutoff Conclusion 
max.C 31 0.427 0.37 YES 

max. Y 31 0.473 0.34 YES 
max. S 8 0.952 0.88 YES 
means 1-30 0.738 0.48 YES 

# Most likely cleavage site between pos. 30 and 3 1 : VQS-KS 



Hgdrophobiol-tg Plot for CG57051-e4 with a window of 19 
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250 



SECP 16 
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A SECP16 nucleic acid and polypeptide according to the invention were obtained by 
exon linking and include the nucleic acid sequence (SEQ ID NO:52) and encoded polypeptide 
sequence (SEQ ID NO: 53) of clone CG57051-05 directed toward novel Angiopoietin-like 
proteins and nucleic acids encoding them. Figure 21 illustrates the nucleic acid sequence and 
5 amino acid sequences respectively. This clone includes a nucleotide sequence (SEQ ID NO: 52 
of 1239 bp. The nucleotide sequence includes an open reading frame (ORF) beginning with an 
ATG initiation codon at nucleotides 80-82 and ending with a TAG stop codon at nucleotides 
1 184-1 186. Putative untranslated regions, if any, are found upstream from the initiation codon 
and downstream from the termination codon. The encoded protein having 368 amino acid 
10 residues is presented using the one-letter code in Figure 21 . The protein encoded by clone 

CG57051-05 is predicted by the PSORT program to be located extracellularly with a certainty of 
0.7332 and has a signal peptide (see Table 28 below). The PGR product derived by exon linking, 
covering the entire open reading frame, was cloned into the pCR2.1 vector from Invitrogen to 
provide clone 157544::CG50847-01.891637.M13 and clone 157544::CG50847-01.891637.O5. 

15 Similarities 

In a search of sequence databases, it was found, for example, that the nucleic acid 
sequence of this invention has 867 of 1064 bases (81%) identical to a gb:GENBANK- 
ID:AF202636|acc:AF202636.1 mRNA from Homo sapiens (Homo sapiens angiopoietin-like 
protein PPl 158 mRNA, complete cds) (See Table 24). The full amino acid sequence of the 
20 protein of the invention was found to have 1 85 of 192 amino acid residues (96%) identical to, 
and 185 of 192 amino acid residues (96%) similar to, the 406 amino acid residue 
ptnr:SPTREMBL-ACC:Q9HBV4 protein from Homo sapiens (Human) (ANGIOPOIETIN-LIKE 
PROTEIN PP1158) (See Table 25). 

A multiple sequence alignment is given in Table 27, with the protein of the invention 
25 being shown on the first line in a ClustalW analysis comparing the protein of the invention with 
related protein sequences. Please note this sequence represents a splice form of Angiopoietin, 
missing exon 4, as indicated in positions 183 to 221 and with SNPs: V156G, A157G, T266M. 

The presence of identifiable domains in the protein disclosed herein was determined by 
searches versus domain databases such as Pfam, PROSITE, ProDom, Blocks or Prints and then 
30 identified by the Interpro domain accession number. Significant domains are summarized below: 



Model Domain seq-f seq-t hmm-f hmm-t score E-value 
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fibrinogen__C 1/2 184 246.. 47 123.. 98.2 4e-27 
fibrinogen_C 2/2 288 362.. 178 272.] 67.0 3.4e-18 



5 IPR002 181; (Fibrinogen_C) 

Fibrinogen, the principal protein of vertebrate blood clotting is an hexamer containing 
two sets of three different chains (alpha, beta, and gamma), linked to each other by disulfide 
bonds. The N-terminal sections of these three chains are evolutionary related and contain the 
cysteines that participate in the cross-linking of the chains. However, there is no similarity 
10 between the C-terminal part of the alpha chain and that of the beta and gamma chains. The C- 
terminal part of the beta and gamma chains forms a domain of about 270 amino-acid residues. 
As shown in the schematic representation this domain contains four conserved cysteines 
involved in two disulfide bonds. 

15 

xxxxCxxxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCxxxxxCxxxxxxxxxxx 

II II 

+ -f- + — -f 

20 *C': conserved cysteine involved in a disulfide bond. 

Such a domain has been recently found in other proteins which are listed below: 

1) Two sea cucumber fibrinogen-like proteins (FReP-A and FReP-B). These are proteins, 
25 of about 260 amino acids, which have a fibrinogen beta/gamma C-terminal domain. 

2) In the C-terminus of Drosophila protein scabrous (gene sea). Scabrous is involved in 
the regulation of neurogenesis in Drosophila and may encode a lateral inhibitor of R8 cells 
differentiation. 

3) In the C-terminus of a mammalian T-cell specific protein of unknown function. 

30 4) In the C-terminus of a human protein of unknown function which is encoded on the 

opposite strand of the steroid 21-hydroxylase/complement component C4 gene locus. 
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The function of this domain is not yet known, but it has been suggested that it could be 
involved in protein-protein interactions. 

This indicates that the sequence of the invention has properties similar to those of other 
proteins known to contain this/these domain(s) and similar to the properties of these domains. 

5 Chromosomal information: 

The Angiopoietin-like gene disclosed in this invention maps to chromosome 19pl3.3. 
This assignment was made using mapping information associated with genomic clones, public 
genes and ESTs sharing sequence identity with the disclosed sequence and CuraGen 
Corporation's Electronic Northern bioinformatic tool. 

10 Tissue expression 

The Angiopoietin-like gene disclosed in this invention is expressed in at least the 
following tissues; Adipose, Liver, Placenta. Expression information was derived from the tissue 
sources of the sequences that were included in the derivation of the sequence of CuraGen Acc. 
No. CG57051-05. 

15 Cellular Localization and Sorting 

The PSORT, SignalP and hydropathy profile for the Angiopoietin-like protein are shown 
in Table 28. The results predict that this sequence has a signal peptide and is likely to be 
localized extracellularly with a certainty of 0.7332. The signal peptide is predicted by SignalP to 
be cleaved between amino acids 25 and 26: AQG-GP. 

20 Functional Variants and Homologs 

The novel nucleic acid of the invention encoding a Angiopoietin-like protein includes the 
nucleic acid whose sequence is provided in Figure 21, or a fragment thereof. The invention also 
includes a mutant or variant nucleic acid any of whose bases may be changed from the 
corresponding base shown in Figure 21 while still encoding a protein that maintains its 
25 Angiopoietin-like activities and physiological functions, or a fragment of such a nucleic acid. 
The invention further includes nucleic acids whose sequences are complementary to the 
sequence of CuraGen Acc. No. CG57051-05, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 
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nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of non-limiting example, modified 
bases, and nucleic acids whose sugar phosphate backbones are modified or derivatized. These 
modifications are carried out at least in part to enhance the chemical stability of the modified 
5 nucleic acid, such that they may be used, for example, as antisense binding nucleic acids in 
therapeutic applications in a subject. In the mutant or variant nucleic acids, and their 
complements, up to about 19% of the bases may be so changed. 

The novel protein of the invention includes the Angiopoietin-like protein whose sequence 
is provided in Figure 21 . The invention also includes a mutant or variant protein any of whose 
10 residues may be changed from the corresponding residue shown in Figure 21 while still encoding 
a protein that maintains its Angiopoietin-like activities and physiological functions, or a 
functional fragment thereof. In the mutant or variant protein, up to about 4% of the amino acid 
residues may be so changed. 

Chimeric and Fusion Proteins 

1 5 The present invention includes chimeric or fusion proteins of the Angiopoietin-like 

protein, in which the Angiopoietin-like protein of the present invention is joined to a second 
polypeptide or protein that is not substantially homologous to the present novel protein. The 
second polypeptide can be fused to either the amino-terminus or carboxyl-terminus of the present 
CG57051-05 polypeptide. In certain embodiments a third nonhomologous polypeptide or protein 

20 may also be fused to the novel Angiopoietin-like protein such that the second nonhomologous 
polypeptide or protein is joined at the amino terminus, and the third nonhomologous polypeptide 
or protein is joined at the carboxyl terminus, of the CG57051-05 polypeptide. Examples of 
nonhomologous sequences that may be incorporated as either a second or third polypeptide or 
protein include glutathione S-transferase, a heterologous signal sequence fused at the amino 

25 terminus of the Angiopoietin-like protein, an immunoglobulin sequence or domain, a serum 
protein or domain thereof (such as a serum albumin), an antigenic epitope, and a specificity 
motif such as (His)6- 

The invention further includes nucleic acids encoding any of the chimeric or fusion 
proteins described in the preceding paragraph. 
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Antibodies 

The invention further encompasses antibodies and antibody fragments, such as Fab, 
(Fab)2 or single chain FV constructs, that bind immunospecifically to any of the proteins of the 
invention. Also encompassed within the invention are peptides and polypeptides comprising 
5 sequences having high binding affinity for any of the proteins of the invention, including such 
peptides and polypeptides that are fused to any carrier particle (or biologically expressed on the 
surface of a carrier) such as a bacteriophage particle. 

Uses of the Compositions of the Invention 

The protein similarity information, expression pattern, cellular localization, and map 
10 location for the protein and nucleic acid disclosed herein suggest that this Angiopoietin-like 
protein may have important structural and/or physiological functions characteristic of the 
Angiopoietin family. Therefore, the nucleic acids and proteins of the invention are useful in 
potential diagnostic and therapeutic applications and as a research tool. These include serving as 
a specific or selective nucleic acid or protein diagnostic and/or prognostic marker, wherein the 
15 presence or amount of the nucleic acid or the protein are to be assessed. These also include 
potential therapeutic applications such as the following: (i) a protein therapeutic, (ii) a small 
molecule drug target, (iii) an antibody target (therapeutic, diagnostic, drug targeting/cytotoxic 
antibody), (iv) a nucleic acid useful in gene therapy (gene delivery/gene ablation), (v) an agent 
promoting tissue regeneration in vitro and in vivo, and (vi) a biological defense weapon. 

20 The nucleic acids and proteins of the invention have applications in the diagnosis and/or 

treatment of various diseases and disorders. For example, the compositions of the present 
invention will have efficacy for the treatment of patients suffering from: type II diabetes, obesity, 
colon cancer, diabetes mellitus, insulin-resistant, with acanthosis nigricans and h)^ertension, 3- 
methylglutaconicaciduria, type III; Cone-rod retinal dystrophy-2; DNA ligase I deficiency; 

25 Glutaricaciduria, type IIB Liposarcoma; Myotonic dystrophy as well as other diseases, disorders 
and conditions. 

These materials are further useful in the generation of antibodies that bind 
immunospecifically to the novel substances of the invention for use in diagnostic and/or 
therapeutic methods. 

30 Tabic 24. BLASTN search using CuraGen Acc. No. CG57051-05. 
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>gb:GENBANK-lD:AF202636 I acc :AF202636 . 1 Homo sapiens angiopoietin-like protein 
PP1158 mRNA, complete cds - Homo sapiens, 1943 bp. (seq id nOsS?) 
Length = 1943 

Plus Strand HSPs: 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 
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Score = 3105 (465.9 bits). Expect = 2.0e-134, P = 2.0e-134 
Identities = 867/1064 (81%) , Positives = 867/1064 (81%) , Strand = Plus / 



CGTCTCCAGTCCTCGCACCTCSGAACCCCAACGTCCCCGAGAGTCCCCGAATCCCCGCTCC 6 3 
llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
CGTCTCCAGTCCTCGCACCTGGAACCCCAACGTCCCCGAGAGTCCCCGAATCCCCGCTCC 156 

CAGGCTACCTAAGAGGATGAGCGGCGCTCCGACGGCCGGGGCAGCCCTGATGCTCTGCGC 123 

llllllllllllllllllllllll lllllllllllllllllllllllllllllllllll 

CAGGCTACCTAAGAGGATGAGCGGTGCTCCGACGGCCGGGGCAGCCCTGATGCTCTGCGC 216 

CGCCACCGCCGTGCTACTGAGCGCTCAGGGCGGACCCGTGCAGTCCAAGTCGCCGCGCTT 183 

IIIIIIIMIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIMI 

CGCCACCGCCGTGCTACTGAGCGCTCAGGGCGGACCCGTGCAGTCCAAGTCGCCGCGCTT 276 

TGCGTCCTGGGACGAGATGAATGTCCTGGCGCACGGACTCCTGCAGCTCGGCCAGGGGCT 243 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIINIIIIIIIIIIIIII 

TGCGTCCTGGGACGAGATGAATGTCCTGGCGCACGGACTCCT6CAGCTCGGCCAGGGGCT 336 



Plus 




Query: 


4 


Sbjct: 


97 


Query: 


64 


Sbjct: 


157 


Query: 


124 


Sbjct: 


217 


Query: 


184 


Sb j Ct : 


277 


Query: 


244 


Sbj ct : 


337 


Query: 


304 


Sbjct: 


397 


Query: 


364 


Sbjct: 


457 


Query: 


424 


Sbjct: 


517 


Query: 


484 


Sbj ct : 


577 


Query: 


544 


Sbj ct : 


637 


Query: 


603 


Sbj ct : 


696 


Query: 


655 


Sbjct: 


756 


Query : 


714 


Sbj ct : 


815 


Query: 


773 


Sbjct : 


866 


Query : 


831 


Sbjct: 


922 



GCGCGAACACGCGGAGCGCACCCGCAGTCAGCTGAGCGCGCTGGAGCGGCGCCTGAGCGC 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIII 

GCGCGAACACGCGGAGCGCACCCGCAGTCAGCTGAGCGCGCTGGAGCGGCGCCTGAGCGC 

GTGCGGGTCCGCCTGTCAGGGAACCGAGGGGTCCACCGACCTCCCGTTAGCCCCTGAGAG 

IIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIMIIIIIIIIIIIII 

GTGCGGGTCCGCCTGTCAGGGAACCGAGGGGTCCACCGACCTCCCGTTAGCCCCTGAGAG 

CCGGGTGGACCCTGAGGTCCTTCACAGCCTGCAGACACAACTCAAGGCTCAGAACAGCAG 

MIIIIIIIIIIIIIIIIMMIIIIIIIIIIIIIIIIIIMIIIIIIIIIMIIIIIM 

CCGGGTGGACCCTGAGGTCCTTCACAGCCTGCAGACACAACTCAAGGCTCAGAACAGCAG 

GATCCAGCAACTCTTCCACAAGGTGGCCCAGCAGCAGCGGCACCTGGAGAAGCAGCACCT 

IIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIII 



GCGAATTCAGCATCTGCAAAGCCAGTTTGGCCTCCTGGACCACAAGCACCTAGACCATGA 

IIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIII 



303 



396 



3 63 



456 



423 



516 



483 



543 



636 



GGGTGGC-AAGCCTGCCCGAAGAAAGAGGCTGCCCGAGATGGCCCAGCCAGTTGACCCGG 

II III! IIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIII 

GG - TGGCCAAGCCTGCCCGAAGAAAGAGGCTGCCCGAGATGGCCCAGCCAGTTGACCCGG 

CTCACAATGTCAGCCGCCTGCACCA- -TGG- -AGGC-TGGACAGTAA-T-TCAGAGGC-G 
IIIIIIIIIIIIIIMIIIIIIII II III I I III I II III I 
CTCACAATGTCAGCCGCCTGCACCGGCTGCCCAGGGATTGCCAGGAGCTGTTCCAGGTTG 



602 



695 



654 



755 



II 



lllillll 



II I 



II 



I II 



nil 



I II I II II I Ml I lllll I I I II M I II I II I 



AACAGCCGCCTGGCCGTGCAGCTGCGGGACTGGGATGGCAAC - - GCCGAGTTGCTGCAGT 830 

I II III lllll III llllll MM I II I I 11 I 

GATGGCTCAGTGGACTT - CAAC - - CGGCCCTGGGAAGCCTACAAGGCGGGGTT- TGGGGA 



II II Ml 



I I 



II I M mil I I 
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Query: 


885 


Sbjct : 


982 


Query : 


942 


Sbjct: 


1038 


Query: 


1000 


Sbjct: 


1092 


Query: 


1059 


Sbjct: 


1146 


Score 


= 3041 


Identities 


Query: 


541 


Sb j Ct : 


754 


Query : 


600 


Sbjct : 


811 


Query : 


659 


Sbjct : 


866 


Query: 


719 


Sbjct : 


926 


Query : 


779 


Sbjct : 


986 


Query : 


839 


Sbjct: 


1046 


Query: 


899 


Sbjct: 


1106 


Query : 


959 


Sbj ct : 


1166 


Query : 


1019 


Sbjct: 


1226 


Query: 


1079 


Sbjct: 


1286 


Query: 


1139 


Sbj ct : 


1346 


Query: 


1199 


Sbjct: 


1406 



I I III Mil II I III II 



I III II I II I II I 



TCTCCACTTGGGACCAGGATCACGACCTCCGCAGGGACA - AGAACTGC - GCCAAGAGCCT 999 

Mill II III II I III I I II I II nil II I II 

TCTCCG - -TGC-ACCTGGGTGGCGAGGACA- C- GGCCTATAGC - CTGCAGCTCACTGCAC 1091 

CTCTGGAGGCTGGTG - GTTTGGCACCTGCAGCCATTCCAACCTCAACGGCCAGTACTTCC 1058 
I I I III II II III II II II I III II II Mill I I II 



II I III 



(457.3 bits). Expect « 7. 
658/699 (94%) , Positives 



II II I llll I II I 



4e-132, P = 7.4e-132 

= 658/699 (94%) , Strand « Plus / Plus 



II I III II I II llll II 



II I I I II II I III II IIIIIIIIIIMIIIIIIIIIIIIIIIIIIII 

- GG - TGA - ACTGCAAGATGACCT - CAG - ATGGAGGCTGGACAGTAATTCAGAGGCGCCAC 865 
GATGGCTCAGTGGACTTCAACCGGCCCTGGGAAGCCTACAAGGCGGGGTTTGGGGATCCC 718 

iiiiiiiiiiiiiMiiiiiiiiMiiiiiiiiiii inn iMiiiii Hill iiiiii 

GATGGCTCAGTGGACTTCAACCGGCCCTGGGAAGCCTACAAGGCGGGGTTTGGGGATCCC 925 



CACGGCGAGTTCTGGCTGGGTCTGGAGAAGGTGCATAGCATCATGGGGGACCGCAACAGC 

iiiiiiiininniiiiiiiniiininiiiiniiiii iiiiiniiniiiii 

CACGGCGAGTTCTGGCTGGGTCTGGAGAAGGTGCATAGCATCACGGGGGACCGCAACAGC 



778 



985 



CGCCTGGCCGTGCAGCTGCGGGACTGGGATGGCAACGCCGAGTTGCTGCAGTTCTCCGTG 83 8 

niiiininiiniiininiiiiinnninniinniiiiniiiiiini 

CGCCTGGCCGTGCAGCTGCGGGACTGGGATGGCAACGCCGAGTTGCTGCAGTTCTCCGTG 104 5 
CACCTGGGTGGCGAGGACACGGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCGGCCAG 898 

iiiiiiiiiiiiiiiiiiinninniiiniiiininiiiniiiiiiiiiiiii 

CACCTGGGTGGCGAGGACACGGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCGGCCAG 1105 
CTGGGCGCCACCACCGTCCCACCCAGCGGCCTCTCCGTACCCTTCTCCACTTGGGACCAG 958 

lllllllllllllllliliiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii 

CTGGGCGCCACCACCGTCCCACCCAGCGGCCTCTCCGTACCCTTCTCCACTT(3GGACCAG 1165 
GATCACGACCTCCGCAGGGACAAGAACTGCGCCAAGAGCCTCTCTGGAGGCTGGTGGTTT 1018 

lllllllllllllllllllllllllllllllllllllllliliiiiiiiiiiiiiiiill 

GATCAC(3ACCTCCGCAGGGA(ZAAGAACTGCGCCAAGAGCCTCTCTGGAGGCTGGTGGTTT 1225 
GGCACCTGCAGCCATTCCAACCTCAACGGCCAGTACTTCCGCTCCATCCCACAGCAGCGG 1078 

iiiiiiiiiii iiiiiiiniiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii 

GGCACCTGCAGCCATTCCAACCTCAACGGCCIAGTACTTCCGCTCCATCCCACAGCAGCGG 1285 
CAGAAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCTACTACCCGCTGCAG 1138 

iiiiiiiiiiiiiiiiiniiniiiiiiiiniiiiiiiiiiiiiiiiniiiiiini 

CAGAAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCTACTACCCGCTGCAG 1345 
GCCACCACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAGCGTCCTGGCTGG 1198 

iiiii linn III iiiiiiiiniiniiiiiiiniiiiiiiiiiiiiniiiiini 

gcc:accac(zatgttgatccagccc:atggcagcagaggc:agcctcctagcgtcctggctgg 1 4 o 5 

GCCTGGTCCCAGGCCCACGAAAGA - GGTGACTCTTGGCTCTG 123 9 

iiiiiniiiiiiiiiiniiiii iniiiiinniiiii 

GCCTGGTCCCAGGCCCACGAAAGACGGTGACTCTTCSGCTCTG 1447 
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Table 25. BLASTP search using the protein of CuraGen Acc. No. CG57051-05. 

>ptnr :SPTREMBL-ACC;Q9HBV4 ANGIOPOIETIN-LIKE PROTEIN PP1158 - Homo sapiens 
(Human) , 4 06 aa. (SEQ id nO:88) 
Length = 406 

Score = 1015 (357.3 bits). Expect = 1.6e-197, Sum P(2) = 1.6e-197 
Identities = 185/192 (96%), Positives = 185/192 (96%) 

10 Query: 177 NVSRXiHHGGWTVIQRRHDGSVDFNRPWEAYKAGFGDPHGEFWLGLEKVHSIMGDRNSRLA 236 

I IIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIII llllllll 

NCKMTSDGGWTVIQRRHDGSVDFNRPWEAYKAGFGDPHGEFWLGLEKVHSITGDRNSRLA 274 

VQLRDWDGNAELLQFSVHLGGEDTAYSLQLTAPVAGQLGATTVPPSGLSVPFSTWDQDHD 296 

13 lllllllllllllllllllllllllllllilllllllllllllllMIIIIIIIIIIIII 

VQLRDWDGNAELLQFSVHLGGEDTAYSLQLTAPVAGQLGATTVPPSGLSVPFSTWDQDHD 334 



25 



40 



Query : 


177 


Sbjct: 


215 


Query: 


237 


Sbjct : 


275 


Query: 


297 


Sbjct: 


335 


Query: 


357 


Sbjct: 


395 



LRRDKNCAKSLSGGWWFGTCSHSNLNGQYFRSIPQQRQKLKKGIFWKTWRGRYYPLQATT 3 56 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
20 Sbjct: 33 5 LRRDKNCAKSLSGGWWFGTCSHSNLNGQYFRSIPQQRQKLKKGIFWKTWRGRYYPLQATT 394 



MLIQPMAAEAAS 368 

llllllllllll 
MLIQPMAAEAAS 4 06 

Score = 923 (324.9 bits), Expect = l,6e-197, Sum P(2) = 1.6e-197 
Identities = 1 80/1 82 (98%), Positives = 1 80/1 82 (98%) 



Query: 1 MSGAPTAGAALMLCAATAVLLSAQGGPVQSKSPRFASWDEMNVLAHGLLQLGQGLREHAE 60 

30 1 1 1 1 1 1 M 1 1 1 1 1 M 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Sbjct: 1 MSGAPTAGAALMLCAATAVLLSAQGGPVQSKSPRFASWDEMNVLAHGLLQLGQGLREHAE 60 

Query: 61 RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRIQQLF 12 0 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIII 

35 Sbjct: 61 RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRIQQLF 120 

Query: 121 HKVAQQQRHLEKQHLRIQHLQSQFGLLDHKHLDHEGGKPARRKRLPEMAQPVDPAHNVSR 18 0 

lllllllllllllllllllllllllllllllllll IIIIIIIIMIIIIIIIIIIIII 

Sbjct: 121 HKVAQQQRHLEKQHLRIQHLQSQFGLLDHKHLDHEVAKPARRKRLPEMAQPVDPAHNVSR 180 



Query: 181 LH 182 
II 

Sbjct: 181 LH 182 



Table 26. BLASTN identity search of CuraGen Corporation's Human SeqCalhng database using CuraGen 
45 Acc.No. CG57051-05. 

>s3aq:217939973 , 631 bp. (seq id nO:89) 
Length = 631 

50 Minus Strand HSPs : 

Score = 2620 (393.1 bits), Expect = 9.1e-113, P = 9.1e-113 

Identities = 526/527 (99%), Positives = 526/527 (99%), Strand = Minus / Plus 

55 

1 1 II M 1 1 M 1 1 II I II II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 

CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 1 64 

GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 1121 

60 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 II I 1 1 1 1 1 1 1 1 1 

GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 224 

GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 1061 

lllllllllllllllllllllllllllllllllillllllllilllllllllllllllll 



Query: 


1239 


Sb j ct : 


105 


Query: 


1180 


Sbj ct : 


165 


Query: 


1120 



112 



30 



35 



55 



U'* O "-y-huH 1":? €v rj ^^-^ K't -A .:fi^ ^.3 r»l 



GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 2 84 

GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCCAGA 1001 

lllllillllllllllllllllllllllllMIIIIIIIIIMIIIIIIIIIIMIIIII 

GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCCAGA 344 

GAGGCTCTTGGCGCAGTTCTTGTCCCTGCGGAGGTCGTGATCCTGGTCCCAAGTGGAGAA 941 

IIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIINIIIIIIIMIIIIIIIIM 

GAGGCTCTTGGCGCAGTTCTTGTCCCTGCGGAGGTCGTGATCCTGGTCCCAAGTGGAGAA 404 
GGGTACGGAGAGGCCGCTGGGTGGGACGGTGGTGGCGCCCAGCTGGCCGGCCACGGGTGC 881 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIII 

GGGTACGGAGAGGCCGCTGGGTGGGACGGTGGTGGCGCCCAGCTGGCCGGCCACGGGTGC 4 64 
AGTGAGCTGCAGGCTATAGGCCGTGTCCTCGCCACCCAGGTGCACGGAGAACTGCAGCAA 821 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIMIIIIMII 

AGTGAGCTGCAGGCTATAGGCCGTGTCCTCGCCACCCAGGTGCACGGAGAACTGCAGCAA 524 
CTCGGCGTTGCCATCCCAGTCCCGCAGCTGCACGGCCT^GGCGGCTGTTGCGGTCCCCCAT 761 

IMIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CTCGGCGTTGCCATCCCAGTCCCGCAGCTGCACGGCCAGGCGGCTGTTGCGGTCCCCCAT 584 
GATGCTATGCACCTTCTCCAGACCCAGCCAGAACTCGCCGTGGGGAT 7 14 

IIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIII 

GATGCTATGCACCTTCTCCAQACCCAGCCAGAACTCGCCGTGGGGAT 631 

>s3aq:230121563 , 788 bp. (SEQ id no:90> 
Length = 788 

Minus Strand HSPs : 

Score = 2583 (387.6 bits). Expect = 3.4e-lll, P = 3.4e-lll 

Identities = 533/548 (97%), Positives = 533/548 (97%), Strand = Minus / Plus 
CAGAGCCAAGAGTCACC-TCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 1181 

lllllllllllllllll IIIIIIIIIIIIIIIIIIMIIIIIIIIMIIIIIMIIIII 

CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 230 





Sbjct : 


225 




Query: 


1060 


5 


Sbjct: 


285 




Query: 


1000 


10 


Sbjct: 


345 




Query: 


940 




Sb j Ct : 


405 


15 


Query: 


680 




Sb j Ct : 


465 


20 


Query: 


820 




Sbjct: 


525 




Query: 


760 


25 


Sbjct: 


585 



Query: 


1239 


Sbjct: 


171 


Query : 


1180 


Sbjct: 


231 


Query : 


1120 


Sbjct: 


291 


Query : 


1060 


Sb j ct : 


351 


Query: 


1000 


Sb j ct : 


411 


Query: 


940 


Sb j ct : 


471 


Query: 


880 


Sb j ct : 


531 


Query: 


820 


Sb j ct : 


591 


Query: 


760 



40 Query: 1180 GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 1121 

IIMIIIIIillllllllllllllllllllllllllllllllllllllllllllllMII 

GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 290 
GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 1061 

_ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 

GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 350 
GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCCAGA 1001 

^. MlllllllllllllllllllllllllllllllllllllllllllllMilillllMII 

Oy) Sb^ct: 351 GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCCAGA 410 

GAGGCTCTTGGCGCAGTTCTTGTCCCTGCGGAGGTCGTGATCCTGGTCCCAAGTGGAGAA 941 

llllllllilllllMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIM 

GAGGCTCTTGGCGCAGTTCTTGTCCCTGCGGAGGTCGTGATCCTGGTCCCAAGTGGAGAA 470 



GGGTACGGAGAGGCCGCTGGGTGGGACGGTGGTGGCGCCCAGCTGGCCGGCCACGGGTGC 881 

lllillllllillllllllllMIIIIIIMIIMIMIIIIIMIIIIIIIIIIIIIII 

GGGTACGGAGAGGCCGCTGGGTGGGACGGTGGTGGCGCCCAGCTGGCCGGCCACGGGTGC 530 



60 Query: 880 AGTGAGCTGCAGGCTATAGGCCGTGTCCTCGCCACCCAGGTGCACGGAGAACTGCAGCAA 821 

IIIIMIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIllllllillililMMII 

AGTGAGCTGCAGGCTATAGGCCGTGTCCTCGCCACCCAGGTGCACGGAGAACTGCAGCAA 590 

CTCGGCGTTGCCATCCCAGTCCCGCAGCTGCACGGCCAGGCGGCTGTTGCGGTCCCCCAT 761 

6S I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I 

""""^ CTCGGCGTTGCCATCCCAGTCCCGCAGCTGCACGGCCAGGCGGCTGTTGCGGTCCCCCGT 650 

GATGCTATGCACCTTCTCCAGACCCAGCCAGAACTCGCCGTGGGGATCCCCAAACCCCGC 701 

Ml II I Ml 1 1 II I in III I I II I Ml III I II I I III III I I II I I 
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Sbjct: 651 GATGCTATGCACCTTCTCCAGACCCAGCCAGAACTCGCC-TGGAGTGGGAGAGGCCACTC 709 
Query: 70 0 CTTGTAGGC 692 

I II nil 

5 Sbjct: 710 CATG-AGGC 717 

>s3aq:217940431 Category E: , 530 bp. (SEQ ID N0:9l) 
Length = 53 0 
Minus Strand HSPs ; 
10 Score = 1795 (269.3 bits). Expect = 2.0e-75, P = 2.0e-75 

Identities = 381/399 (95%) , Positives = 381/399 (95%) , Strand = Minus / Plus 
Query: 553 CTTGCCACCCTCATGGTCTAGGTG-CTT-GTGGTCCAG-GAGGCCAAACTGGCTTTGCAG 497 

' II I I II III I II I III II II III iiiiiii iiiiiiiiiiii 

Sbjct: 132 CTGGTCCCCGTCA-G-TCAATGTGACTGAGTCCGCCATTGAGGCCAGTCTGGCTTTGCAG 189 



15 



35 



Query: 496 ATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGTGGAA 4 37 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
Sbjct: 190 ATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGTGGAA 249 



20 Query: 436 GAGTTGCTGGATCCTGCTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGACCTC 377 

III II I III II III III III II III II Mini II III II MUM II Mini II MM 

Sbjct: 250 GAGTTGCTGGATCCTGCTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGACCTC 309 
Query: 376 AGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCTGACA 317 

^5 II 1 1 II M 1 1 1 1 II 1 1 1 1 II I II 1 1 1 1 1 II I M 1 1 1 II 1 1 II 1 1 II II M II I II 1 1 1 II 

Sbjct: 310 AGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCTGACA 3 69 
Que ry : 316 GGCGGACCCGCACGCGCTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGCGGGTGCGCTC 257 

MMIMMMMIIIIIMMIIIII I II 1 1 II II I II I II II M I II I II I M II I 

3U Sbjct: 370 GGCGGACCCGCACGCGCTCAGGCGCCGTTTCAGCGCGCTCAGCTGACTGCGGGTGCGCTC 429 
Que ry : 256 CGCGTGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCATCTC 197 

lllllllllll IIIIIIMIIIIIIIIIIIIIIIIMIIIIMIMIIIIIIIMIIIII 

Sb j c t : 430 CGCGTGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCATCTC 489 



Query: 196 GTCCCAGGACGCAAAGCGCGGCGACTTGGACTGCACGGGTC 156 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIII 
Sbjct: 490 GTCCCAGGACGCAAAGCGCGGCGACTTGGACTGCACGGGTC 530 

40 >s3aq:217940613 , 336 bp. (SEQ ID NO:92) 
Length = 336 

Minus Strand HSPs : 

45 Score = 995 (149.3 bits). Expect = 9.4e-56, Sum P(2) = 9.4e-56 

Identities = 203/204 (99%) , Positives = 203/204 (99%) , Strand = Minus / Plus 

GGTGCAGGCGGCTGACATTGTGAGCCGGGTCAACTGGCTGGGCCATCTCGGGCAGCCTCT 567 

MMMMMIMMMMMMMMMIMIMMMIMMIMMMMMIMM 

GGTGCAGGCGGCTGACATTGTGAGCCGGGTCAACTGGCTGGGCCATCTCGGGCAGCCTCT 192 
TTCTTCGGGCAGGCTTG - CCACCCTCATGGTCTAGGTGCTTGTGGTCCAGGAGGCCAAAC 5 08 

MIIIIIMIIIIIMI Mill IIIMIIMMIIIIIIIIIIMIIIIMIIIIMI 

TTCTTCGGGCAGGCTTGGCCACC - TCATGGTCTAGGTGCTTGTGGTCCAGGAGGCC/U^AC 2 51 
TGGCTTTGCAGATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCC 448 

lllllllllll III III I MM Mill III III IMMMIMMIMMMMMMM 

TGGCTTTGCAGATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCC 311 





Query : 


626 


50 


Sb j Ct : 


133 




Query: 


566 




Sbjct : 


193 


55 






Query : 


507 




Sbjct: 


252 


60 


Query: 


447 




Sbjct: 


312 


65 


Score 


= 410 


Identities ■• 




Query: 


717 



lllllllllllllllllllllllll 



(61.5 bits). Expect = 9.4e-56, Sum P(2) = 9.4e-56 
86/91 (94%), Positives = 86/91 (94%), Strand = Minus / Plus 



lllllllllllllllllllllillllllllllllllllllllilllllllllllllllll 

114 



Sbjct : 



1 GGATCCCCAAACCCCGCCTTGTAGGCTTCCCAGGGCCGGTTGAAGTCCACTGAGCCATCG 60 



10 



15 



Query: 657 TGGCGCCTCTGAATTACTGTCCAGCCTCCAT 627 

llllllllllllllll llllll II I I 
Sbjct: 61 TGGCGCCTCTGAATTAATGTCCACTCTGCCT 91 



>s3ag: 217939964 , 328 bp. (SEQ ID NO: 93) 
Length « 328 



Plus Strand HSPs : 



Score = 762 (114.3 bits), Expect = 1.5e-28, P = 1.5e-28 

Identities ^ 156/159 (98%), Positives = 156/159 (98%), Strand = Plus / Plus 

Query: 1082 AAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCTACTACCCGCTGCAGGCC 1141 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
Sbjct: 1 AAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCTACTACCCGCTGCAGGCC 60 

20 Que r y : 1142 ACCACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAGCGTCCTGGCTGGGCC 1201 
llllll III lllllllllllllllllllllllllllllllllllllllllllllllllll 
Sbjct: 61 ACCACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAGCGTCCTGGCTGGGCC 120 

Query: 1202 TGGTCCCAGGCCCACGAAAGA-GGTGACTCTTGGCTCTG 1239 

25 llllllllllll llllllll lllllllllllllll I 

Sbjct: 121 TGGTCCCAGGCCAACGAAAGACGGTGACTCTTGGCTCCG 159 



Table 27. ClustalW alignment of CG57051-05 protein with related proteins. 
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[vl S GAPTA G AALMLCAA TAVLL SA^GP VQ S KS PRFAS WD E MNV L AHGL LQLGQGL RE H AE 
[vISGAPTA G AALMLCAA TAVLLSA^GPVQSKS PRFAS WD E MNV L AHGL LQLGQGL RE HAE 
!vIS GAPTA G AALMLC A A TAVLLSA^GPVQSKS PRFAS WD E MNV L AHGL LQLGQGL RE HAE 
!vlS GAPTA G AALMLC A ATA VLL SAt5a<^PVQSKSPRFAS-WDE MNV L AHGL LQLGQGL RE HAS 





i 


Q9HBV4 


1 


CG57051-04 


1 


CG57051-02 


1 




^1 

Ol 


Q9HBV4 


61 


CG57051-04 


61 


CG57051-02 


61 


f\Jj 1 -KJJ 


191 


Q9HBV4 


121 


CG57051-04 


121 


CG57051.02 


121 




1 Rl 
lot 


Q9HBV4 


181 


CG5705U04 


181 


CG57051-02 


181 


/uji-uj 




Q9HBV4 


241 


CG57051-04 


183 


CG5705I-02 


203 






Q9HBV4 


301 


CG57051-04 


183 


CG57051-02 


263 


CG57051-05 


308 


Q9HBV4 


346 


CG57051-04 


184 


CG57051.02 


323 


CG57051-05 


365 


Q9HBV4 


403 


CG57051-04 


239 


CG57051-02 


383 



RTRSQL S ALERRLSACGSACQGTEGSTDLPLAPESRVDPE VLH S LQTQLKAQN S R I QQL F 
RTRSQL S ALERRLSACGSACQGTEGSTDLPLAPESRVDPE VLHSLQTQLKAQNS.RIQQL'F 
RTRSQL S ALERRLSACGSACQGTEGSTDtiPLAPESRVDPE VLHSLQTQLKAQWSR IQQLF 

RTRSQL s alerrlsacgsacqgtegstd'lpl apesrvdpe vlh S LQTQLKAON S R I qol f 



HKVAQQQ RHLEICQHLRIQHLQSQFGLLDHKHLDHE^KPA RRKRLPEMAQP VDP AHN V SR 
HICVAQQQ RHLEICQHLRIQHLQSQFGLLDHfCHLDHEVAKPA RRKRLPEMAQPVDP AHN V S R 
HKVAQQQ RHLEKQHLRIQHLQSQFGLLDHfOiLDHEVAICPA RRKRLPEMAQPVDP AHN V S R 
HKVAQQQ RHLEICQHLRIQHLQSQFGLLDHKHLDHEVAKPA RRKRLPEMAQP VDP AHN V SR 



Hi 

-PRO CQELFQVGERQSGLFEIQPQGSPPFLVNCKMTS Dg 

::::::::::::::::::::::::::::::::::::: g 



GWTV I QRRHDG S VDFNRP 
GWTV I QRRHDaS VDFNRP 



GWTV I QRRHDG SI 



202 
240 

183 

vT nlJii-U 202 



A^TAYKAG FGDPHGEFWLGLEKVHS H2PDRNSRLAVQL RDWDGNA ELLQFS VHL.GGEDTAY 
^TIAYKAG FGDPHGEFWLGLEKVHS'ITGD'RNSRLAVQL RDWDGNA ELLQFS VHLGtSEDTAY 



WEAYKAG FGDPHGEFWLGLEKVHS;lTGDRNSRLAVOL RDV/D GN A ELLO FS VHL GGEDT AY 



S LQL T A P V AGQ L GAT T VP P S GL S VR F S T WDQDHDL RR D KN C AK S L 
S LQLTA P V AGQL GATT VPP S GLS VPF S^TWDQDHDLRRD KN C AK S L 



S LQL T A P V AGQL GATT VP P S GLS VP. F S T.WDQDHDL RR D KN CAKSL 



262 
300 
183 
262 

307 

345 

183 

SAPSVAQRPDHVPSP 322 



S GGV^^'WFGTC S HS N LNGQYFRS 'I^P QQ RQKLKKG I FWKT WRG R YYP L QATTML I QPMAA 
SGGWWFGTC S HS N LNGQYFRS IP QQ RQKLKKG I FWKT WRG R YYP L QATTML 1 QPMAA 
GWWFGTC S HS N LNGQYFRS I P QQ RQKLKKG I FWKT WRG R YYP L QATTML I QPMAA 
GW^FGTC S HS N LNGQYFRS I P QQ RQKLKKG I FV7KT WRG R YYP L QATTML I QPMAA 



364 
402 
238 
382 

368 
406 
242 
386 



Information for the ClustalW proteins: 

Accno Common Name Length 

CG5705 1 -05 novel Angiopoietin-like protein 368 

CG5705 1 -04 Angiopoietin-like protein- isoform 4 242 

CG57051-02 Angiopoietin-like protein- isoform 2 386 

Q9HBV4 ANGIOPOIETIN-LIKE PROTEIN PPl 1 58. 406 

In the alignment shown above, black outlined amino acid residues indicate residues 
identically conserved between sequences (i.e., residues that may be required to preserve 
5 structural or functional properties); amino acid residues with a gray background are similar to 
one another between sequences, possessing comparable physical and/or chemical properties 
without altering protein structure or function (e.g. the group L,V, I, and M may be considered 
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similar); and amino acid residues with a white background are neither conserved nor similar 
between sequences. 

Table 28. PSORT, SignalP and hydropathy results for CuraGen Acc. No. CG57051-05. 

5 outside — Certainty=0.7332(Affirmativc) < sucO 

microbody (peroxisome) — Certainty=0.2608(Affirmativc) < succ> 
endoplasmic reticulum (membrane) — Certainty=0. lOOO(Affirmative) < succ> 
endoplasmic reticulum (lumen) — Certainty=0.1000(Afrirmative) < suco 

10 

Is the sequence a signal peptide? 

# Measure Position Value Cutoff Conclusion 
max. C 31 0.306 0.37 NO 

max. Y 26 0.429 0,34 YES 
15 max.S 8 0.952 0.88 YES 
means 1-25 0.848 0.48 YES 

# Most likely cleavage site between pos. 25 and 26: AQG-GP 

SECP 17 

20 A SECP 17 nucleic acid and polypeptide according to the invention includes the nucleic 

acid sequence (SEQ ID NO: 54) and encoded polypeptide sequence (SEQ ID NO: 55) of clone 

CG57051-02 directed toward novel Angiopoietin-like proteins and nucleic acids 
encoding them. Figure 22 illustrates the nucleic acid sequence and amino acid sequences 
respectively. This clone includes a nucleotide sequence (SEQ ID NO: 54) of 1315 bp. The 

25 nucleotide sequence includes an open reading frame (ORF) beginning with an ATG initiation 
codon at nucleotides 155-157 and ending with a TAG stop codon at nucleotides 1313-1315. 
Putative untranslated regions, if any, are found upstream from the initiation codon and 
downstream from the termination codon. The encoded protein having 386 amino acid residues is 
presented using the one-letter code in Figure 22. The protein encoded by clone CG57051-02 is 

30 predicted by the PSORT program to be located extracellularly with a certainty of 0.7332 and has 
a signal peptide (see Table 33 below). The PGR product derived by exon linking, covering the 
entire open reading frame, was cloned into the pCR2.1 vector from Invitrogen to provide clone 
157544::CG50847-01.891637.M13 and clone 157544::CG50847-01.891637.O5. SeqCalling 
procedures were also utilized to identify CG57051-02, and the following public components 

35 were thus included in the invention: gb_accno: AGO 10323 Homo sapiens chromosome 19 clone 
CTD-2550O8, WORKING DRAFT SEQUENCE, 55 unordered pieces. In addition, the 
following Curagen Corporation SeqCalling Assembly ID*s were also included in the invention: 
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1 6237775 1 . The DNA and protein sequences for the novel Angiopoietin-like gene are reported 
here as CuraGen Acc. No. CG57051-02. 

Similarities 

CG57051-04 directed toward novel Angiopoietin-like proteins and nucleic acids 
5 encoding them. Figure 20 illustrates the nucleic acid sequence and amino acid sequences 
respectively. This clone includes a nucleotide sequence (SEQ ID NO: II) of 937 bp. The 
nucleotide sequence includes an open reading frame (ORF) beginning with an ATG initiation 
codon at nucleotides 155-157 and ending with a TAG stop codon at nucleotides 881-883. 
Putative untranslated regions, if any, are found upstream from the initiation codon and 
10 downstream from the termination codon. The encoded protein having 242 amino acid residues is 
presented using the one-letter code in Figure 20. The protein encoded by clone CG57051-04 is 
predicted by the PSORT program to be located at the endoplasmic reticulum with a certainty of 
0.8200, and appears to be a signal protein (see Table 27 below). 

In a search of sequence databases, it was found, for example, that the nucleic acid 
15 sequence of this invention has 696 of 700 bases (99%) identical to a gb:GENBANK- 

ID:AF202636|acc:AF202636.1 mRNA from Homo sapiens (Homo sapiens angiopoietin-like 
protein PPl 158 mRNA, complete cds) (Table 29). The full amino acid sequence of the protein 
of the invention was found to have 179 of 182 amino acid residues (98%) identical to, and 180 of 
1 82 amino acid residues (98%) similar to, the 406 amino acid residue ptnrrSPTREMBL- 
20 ACC:Q9NZU4 protein from Homo sapiens (Human) (HEPATIC ANGIOPOIETIN-RELATED 
PROTEIN) (Table 30). 

A multiple sequence alignment is given in Table 32, with the protein of the invention 
being shown on the first line in a ClustalW analysis comparing the protein of the invention with 
related protein sequences. 

25 The presence of identifiable domains in the protein disclosed herein was determined by 

searches versus domain databases such as Pfam, PROSITE, ProDom, Blocks or Prints and then 
identified by the Interpro domain accession number. Significant domains are summarized below: 
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hiranpfam ~ 9earcK a single seq agaii\s« HMH database 
HHMER (D«c 1996) 

CopyxigKt7 (C) 1992-1998 l9a9Kingt/on Vnivezsi^y School o£ Hedicine 

HMMER is fzeely distributed under the &1IRJ Cenezal Public License (GPL). 



HHM file: pfajnHHHs 

Sequence file: / data^/ genetool s/Jfspyte)f 39627Cg57051_02Prot!e inFastfa.txtj 

Query: C&57051_02 

Scores for sequence family classification (score includes all domains): 
Hodel Description Score C-value 17 

f ibrinogen_C Fibrinogen beta and gajnma chains, C'term 143.9 3.6e-'40 2 



Parsed for domains: 

Kodel Domain seq-*f seq-t hnm-f hiwn-t score E-'v-alue 



fibrinogen_C 1/2 184 246 .. 47 120 102. S 2.4o-28 

£ibzinogen_C 2/2 288 380 178 272 .1 43.4 1.9e-ll 



Alignments of top'scoring domains: 

f ibr inogen_C: domain 1 of 2, from 184 to 246: score 102.5, E » 2.4e''28 
*- >G&Vrr\J£ Q rRqD Cs 1 nryRntdk dVk e GTGnl s t s gt GklcYC gl p gE ITS 
G&WrV^QrR DGs ♦r»-R !»«■■•- Vic-*- GrG*-*- gEFW 
C&57051 02 184 G&urUIQRIUroGSHDrHRPlIEAVKACrCDPH GEFW 218 



LGirdlf ihlLTkqgs ipyeLRveLeDwn&et<-* 
LG++)« h++T + L v+L+Dw«-G++ 

CG57051_02 219 LGLEK\«3 ITGDR'-NSRLAVQLRDUDCHA 246 

f ibr inogen__C : domain 2 of 2, from 288 to 380: score 43.4, E => 1.9e-ll 

*->r3TyDrDlin)g'WsTt; spsgnCAesyg gCRG 

r3T-^D D D + ■♦"»-nCA+3 + ++ ++ g& 

CG37051_02 288 FSTXODQDHD — L-~RRDK]SrC AKSLSapsvaqrpdhvpspltpaGG — 328 

aXllii^s ChaANLNGr YY . y Ggty sp qCmaphGtDnGwliilatlinc Gsnq 

WW- C +1ILNG V -•-+ G++W tTiW-G+ 

CG57051_02 329 -IWrGTCSHSllLnGQYrrs ipQQRQKLKK GirWRTWRGR— 366 

AqPGGYwySmk f acHK iRP r<-* 
y •♦•+ •»•+« i P 
CG57051_02 367 YVPLQATTHL IQPH 380 



IPR002181 : Fibrinogen [1] , the principal protein of vertebrate blood clotting is an 
hexamer containing two sets of three different chains (alpha, beta, and gamma), linked to each 
other by disulfide bonds. The N-terminal sections of these three chains are evolutionary related 
and contain the cysteines that participate in the cross-linking of the chains. However, there is no 
similarity between the C-terminal part of the alpha chain and that of the beta and gamma chains. 
The C-terminal part of the beta and gamma chains forms a domain of about 270 amino-acid 
residues. As shown in the schematic representation this domain contains four conserved 
cysteines involved in two disulfide bonds. 

xxxxCxxxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCxxxxxCxxxxxxxxxxx 

II II 

+ + + + 



*C*: conserved cysteine involved in a disulfide bond. 



Such a domain has been recently found [2] in other proteins which are listed below. 
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Two sea cucumber fibrinogen-like proteins (FReP-A and FReP-B). These are proteins, of 
about 260 amino acids, which have a fibrinogen beta/gamma C-terminal domain. In the C- 
terminus of Drosophila protein scabrous (gene sea). Scabrous is involved in the regulation of 
neurogenesis in Drosophila and may encode a lateral inhibitor of R8 cells differentiation. In the 
5 C-terminus of a mammalian T-cell specific protein of unknown function. In the C-terminus of a 
human protein of unknown function which is encoded on the opposite strand of the steroid 21- 
hydroxylase/complement component C4 gene locus. 

The function of this domain is not yet known, but it has been suggested [2] that it could 
be involved in protein-protein interactions. 

10 This indicates that the sequence of the invention has properties similar to those of other 

proteins known to contain this/these domain(s) and similar to the properties of these domains. 

Chromosomal information: 

The Angiopoietin-like gene disclosed in this invention maps to chromosome 19ql3.3. 
This assignment was made using mapping information associated with genomic clones, public 
15 genes and ESTs sharing sequence identity with the disclosed sequence and CuraGen 
Corporation's. Electronic Northern bioinformatic tool. 

Tissue expression 

The Angiopoietin-like gene disclosed in this invention is expressed in at least the 
following tissues: adipocytes. Expression information was derived from the tissue sources of the 
20 sequences that were included in the derivation of the sequence of CuraGen Acc. No. CG57051- 
02. 

Cellular Localization and Sorting 

The PSORT, SignalP and hydropathy profile for the Angiopoietin-like protein are shown 
in Table 33. Although PSORT suggests that the Angiopoietin-like protein may be localized in 
25 the nucleus, the protein of CuraGen Acc. No. CG57051-02 predicted here is similar to the 

Angiopoietin family, some members of which are secreted. Therefore it is likely that this novel 
Angiopoietin-like protein is localized to the same sub-cellular compartment. 
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Functional Variants and Homologs 

The novel nucleic acid of the invention encoding an Angiopoietin-like protein includes 
the nucleic acid whose sequence is provided in Figure 22, or a fragment thereof The invention 
also includes a mutant or variant nucleic acid any of whose bases may be changed from the 
5 corresponding base shown in Figure 22 while still encoding a protein that maintains its 

Angiopoietin-like activities and physiological functions, or a fragment of such a nucleic acid. 
The invention further includes nucleic acids whose sequences are complementary to the 
sequence of CuraGen Acc. No. CG57051-02, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 

10 nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 

chemical modifications. Such modifications include, by way of non-limiting example, modified 
bases, and nucleic acids whose sugar phosphate backbones are modified or derivatized. These 
modifications are carried out at least in part to enhance the chemical stability of the modified 
nucleic acid, such that they may be used, for example, as antisense binding nucleic acids in 

15 therapeutic applications in a subject. In the mutant or variant nucleic acids, and their 
complements, up to about 1% of the bases may be so changed. 

The novel protein of the invention includes the Angiopoietin-like protein whose sequence 
is provided in Figure 22. The invention also includes a mutant or variant protein any of whose 
residues may be changed from the corresponding residue shown in Figure 22 while still encoding 
20 a protein that maintains its Angiopoietin-like activities and physiological functions, or a 

functional fragment thereof. In the mutant or variant protein, up to about 2% of the amino acid 
residues may be so changed. 

Antibodies 

The invention further encompasses antibodies and antibody fragments, such as Fab, 
25 (Fab)2 or single chain FV constructs, that bind immunospecifically to any of the proteins of the 
invention. Also encompassed within the invention are peptides and polypeptides comprising 
sequences having high binding affinity for any of the proteins of the invention, including such 
peptides and polypeptides that are fused to any carrier particle (or biologically expressed on the 
surface of a carrier) such as a bacteriophage particle. 
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Uses of the Compositions of the Invention 

The protein similarity information, expression pattern, cellular localization, and map 
location for the protein and nucleic acid disclosed herein suggest that this Angiopoietin-like 
protein may have important structural and/or physiological functions characteristic of the 
5 Angiopoietin family. Therefore, the nucleic acids and proteins of the invention are useful in 

potential diagnostic and therapeutic applications and as a research tool. These include serving as 
a specific or selective nucleic acid or protein diagnostic and/or prognostic marker, wherein the 
presence or amount of the nucleic acid or the protein are to be assessed. These also include 
potential therapeutic applications such as the following: (i) a protein therapeutic, (ii) a small 
10 molecule drug target, (iii) an antibody target (therapeutic, diagnostic, drug targeting/cytotoxic 
antibody), (iv) a nucleic acid useful in gene therapy (gene delivery/gene ablation), (v) an agent 
promoting tissue regeneration in vitro and in v/vo, and (vi) a biological defense weapon. 

The nucleic acids and proteins of the invention have applications in the diagnosis and/or 
treatment of various diseases and disorders. For example, the compositions of the present 
1 5 invention will have efficacy for the treatment of patients suffering from: type II diabetes, obesity, 
colon cancer, DIABETES MELLITUS, INSULIN-RESISTANT, WITH ACANTHOSIS 
NIGRICANS AND HYPERTENSION,3-methylglutaconicaciduria, type III; Cone-rod retinal 
dystrophy-2;DNA ligase I deficiency; Glutaricaciduria, type IIB;Liposarcoma; Myotonic 
dystrophy as well as other diseases, disorders and conditions. 

20 These materials are further useful in the generation of antibodies that bind 

immunospecifically to the novel substances of the invention for use in diagnostic and/or 
therapeutic methods. 

Tabic 29, BLASTN search using CuraGen Acc. No. CG5705 1-02. 

>gb:GENBANK-ID:AF202636 I acc :AF202636 . 1 Homo sapiens angiopoietin-like protein 
25 PP1158 mRNA, complete cds - Homo sapiens, 194 3 bp. (SEQ id nO:94) 

Length = 1943 

Plus Strand HSPs : 

30 Score = 3448 (517.3 bits). Expect = 8.3e-233, Sum P(2) = 8.3e-233 

Identities = 696/700 (99%) , Positives = 696/700 (99%) , Strand = Plus / Plus 

Query: 2 GCGGATCCTCACACGACTGTGATCCGATTCTTTCCAGCGGCTTCTGCAACCAAGCGGGTC 61 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
35 Sbjct: 20 GCGGATCCTCACACGACTGTGATCCGATTCTTTCCAGCGGCTTCTGCAACCAAGCGGGTC 79 

Query: 62 TTACCCCCGGTCCTCCGCGTCTCCAGTCCTCGCACCTGGAACCCCAACGTCCCCGAGAGT 121 

lllllllllllllllllllllllllllllllllllllillllllllllllllllllllll 
Sbjct: 80 TTACCCCCGGTCCTCCGCGTCTCCAGTCCTCGCACCTGGAACCCCAACGTCCCCGAGAGT 139 

40 
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CCCCGAATCCCCGCTCCCAGGCTACCTAAGAGGATGAGCGGTGCTCCGACGGCCGGGGCA 181 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 
CCCCGAATCCCCGCTCCCAGGCTACCTAAGAGGATGAGCGGTGCTCCGACGGCCGGGGCA 199 

GCCCTGATGCTCTGCGCCGCCACCGCCGTGCTACTGAGCGCT-AGATCTGGACCCGTGCA 240 

llllllllllllllllllllllllllllllllllllllllll II I lllllllllll 
GCCCTGATGCTCTGCGCCGCCACCGCCGTGCTACTGAGCGCTCAGGGC-GGACCCGTGCA 258 

GTCCAAGTCGCCGCGCTTTGCGTCCTGGGACGAGATGAATGTCCTGGCGCACGGACTCCT 300 

10 ' Mini llllllllll III I II II II III! II MM II II III II II lllllllllll II 

GTCCAAGTCGCCGCGCTTTGCGTCCTGGGACGAGATGAATGTCCTGGCGCACGGACTCCT 318 

GCAGCTCX^GCCAGGGGCTGCGCGAACACGCGGAGCGCACCCGCAGTCAGCTGAGCGCGCT 360 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
15 SbjcC: 319 GCAGCTCGGCCAGGGGCTGCGCGAACACGCGGAGCGCACCCGCAGTCAGCTGAGCGCGCT 378 

GGAGCGGCGCCTGAGCGCGTGCGGGTCCGCCTGTCAGGGAACCGAGGGGTCCACCGACCT 420 
llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 



20 
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Queiry : 


122 


Sbjct : 


140 


Query: 


182 


Sbj Ct : 


200 


Query : 


241 


Sbjct: 


259 


Query: 


301 


Sbjct : 


319 


Query: 


361 


Sbjct: 


379 


Query : 


421 


Sbjct : 


439 


Query: 


481 


Sbjct : 


499 


Query : 


541 


oJJ J U . 




Query: 


601 


Sbjct : 


619 


Query: 


661 


Sbjct : 


679 



438 

CCCGTTAGCCCCTGAGAGCCGGGTGGACCCTGAGGTCCTTCACAGCCTGCAGACACAACT 4 80 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIII 

CCCGTTAGCCCCTGAGAGCCGGGTGGACCCTGAGGTCCTTCACAGCCTGCAGACACAACT 4 98 
25 Query: 481 CAAGGCTCAGAACAGCAGGATCCAGCAACTCTTCCACAAGGTGGCCCAGCAGCAGCGGCA 540 

IIMIIIIIIIIIIMIIIIIIIIMMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CAAGGCTCAGAACAGCAGGATCCAGCAACTCTTCCACAAGGTGGCCCAGCAGCAGCGGCA 558 
CCTGGAGAAGCAGCACCTGCGAATTCAGCATCTGCAAAGCCAGTTTGGCCTCCTGGACCA 600 

30 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I 1 1 I 1 1 1 I 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 M 1 1 I 1 1 I 1 1 1 

CCTGGAGAAGCAGCACCTGCGAATTCAGCATCTGCAAAGCCAGTTTGGCCTCCTGGACCA 618 

CAAGCACCTAGACCATGAGGTGGCCAAACCTGCCCGAAGAAAGAGGCTGCCCGAGATGGC 660 

lllllllllllllllllllllllllll llllllllllllllllllllllllllllllll 
35 Sbjct: 619 CAAGCACCTAGACCATGAGGTGGCCAAGCCTGCCCGAAGAAAGAGGCTGCCCGAGATGGC 678 

CCAGCCAGTTGACCCGGCTCACAATGTCAGCCGCCTGCACC 701 

lllllllllllllllllllllllllllllllllllllllll 
CCAGCCAGTTGACCCGGCTCACAATGTCAGCCGCCTGCACC 719 



Score = 1887 (283.1 bits). Expect » 8,3e-233, Sum P(2) = 8.3e-233 
Identities = 399/415 (96%), Positives = 399/415 (96%), Strand = Plus / Plus 



CCTGCACCATGGAGGCTGGAC7VGTAATTCAGAGGCGCCACGATGGCTCAATGGACTTCAA 753 

45 ~ III M lllllllllllllllllllllllll llllllllll llllll llllllllll 

CCT- CAG- ATGGAGGCTGGACAGTAATTCAGAGGCGCCACGATGGCTCAGTGGACTTCAA 885 

CCGGCCCTGGGAAGCCTACAAGGCGGGGTTTGGGGATCCCCACGGCGAGTTCTGGCTGGG 813 

I II I I I II I I I I I I I I I I I I I I I M II I I I II I II I I II I II I I I I I I I I II II I I I II I 

50 Sbjct: 886 CCGGCCCTGGGAAGCCTACAAGGCGGGGTTTGGGGATCCCCACGGCGAGTTCTGGCTGG6 945 

TCTGGAGAAGGTGCATAGCATCACGGGGGACCGCAACAGCCGCCTGGCCGTGCAGCTGCG 873 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

TCTGGAGAAGGTGCATAGCATCACGGGGGACCGCAACAGCCGCCTGGCCGTGCAGCTGCG 1005 



Query: 


694 


Sbjct: 


828 


Query: 


754 


Sbjct : 


886 


Query : 


814 


Sbj ct : 


946 


Query: 


874 


Sbjct: 


1006 


Query: 


934 


Sbjct: 


1066 


Query : 


994 


Sbj ct : 


1126 


Query: 


1054 


Sbjct: 


1186 



llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 



1065 



60 Query: 934 GGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCGGCCAGCTGGGCGCCACCACCGTCCC 993 

IMIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 
GGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCGGCCAGCTGGGCGCCACCACCGTCCC 1125 

ACCCAGCGGCCTCTCCGTACCCTTCTCCACTTGGGACCAGGATCACGACCTCCGCAGGGA 1053 

65 ' II I II I 1 1 1 I I 1 1 1 1 1 1 I I 1 1 M I M II I I 1 1 II II I I 1 1 I I I I I I II II II II II II II 

ACCCAGCGGCCTCTCCGTACCCTTCTCCACTTGGGACCAGGATCACGACCTCCGCAGGGA 1185 
CAAGAACTGCGCCAAGAGCCTCTCTGCCCCATCGGTGGCTCAAAGACCTG - A- CCAT 1108 

llllll III INI I III II II I Mil MUM I Mill I MM 

70 Sbjct: 1186 CAAGAACTGCGCCAAGAGCCTCTCTGGAGGCT-GGTGGTTTGGC-ACCTGCAGCCAT 1240 
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Score = 936 (140.4 bits). Expect = 6.1e-190, Sum P(2) = 6.1e-190 
Identities = 312/407 (76%), Positives = 312/407 (76%), Strand = Plus / Plus 

CCGTGCACCTGGGTGGCGAGGACACGGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCG 968 

iiiiMi III I I I III III I III I II I mil I III 

CCGTGCAGCTGCGGGACTGGGAT- -GGCA-AC-GCC-G-A6TTG-CTGCAGTTCT- -CCG 1043 
uuecv GCCAGCTGGGCGCC-ACCAC-CGTCCCAC--CCAGCGGCCTCTCCGTACCCTTCTCCACT 1024 

10 II llllll III! II II M II I I mil II 



15 
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Query: 


909 


Sbjct: 


993 


Query : 


969 


Sbjct: 


1044 


Query: 


1025 


Sbjct: 


1103 


Query : 


1083 




1155 


Query: 


1136 


Sbjct: 


1214 


Query : 


1196 


Sbjct: 


1274 


Query: 


1256 


Sbjct: 


1334 
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I I II Ml III III II I I M I II I IN III II 

CAG - - CTGGGCGCCACCACCGTCC - CACCCAGCGGC - CT - CTCCGT - ACCCT - TCT - CCA 1154 
CATCGGT GGCTCAAAGACCTGACCATGTTCCCT- - CTCC- CCT-GACCCCGGCAGGA 113 5 

I I II II III mil nil II I II II N I III 

CTTGGGACCAGGATCAC-GACCTCCGCAGGGACAAGAACTGCGCCAAGAGCCTCTCTGGA 1213 
GGCTGGTGGTTTGGCACCTGCAGCCATTCCAACCTCAACGGCCAGTACTTCCGCTCCATC 1195 

IIMIIIIMMIMIIIMIIIIIIIIIIIMMMIIIIIIIIIIIIIIIIMMIM 

GGCTGGTGGTTTGGCACCTGCAGCCATTCCAACCTCAACGGCCAGTACTTCCGCTCCATC 1273 



25 Ouerv- 1196 CCACAGCAGCGGCAGAAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCTAC 1255 

IIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIMII 

CCACAGCAGCGGCAGAAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCTAC 
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TACCCGCTGCAGGCCACCACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAG 1315 

30 ^""^ MM II I INI llllll III I II INI I III Ml IIIIMI Mill II Ml 1 1 IIIIMI _ 

TACCCGCTGCAGGCCACCACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAG 1393 

Table 30. BLASTP search using the protein of CuraGen Acc. No. CG57051-02. 

>ptnr:SPTREMBL-ACC:Q9NZU4 HEPATIC ANGIOPOIETIN- RELATED PROTEIN - Homo sapiens 
35 . (Human), 406 aa . (seq id nO!95) 

Length = 4 06 



Score - 919 (323.5 bits). Expect = 4.9e-194, Sum P(3) = 4.9e-194 
= 179/182 (98%) , Positives = 180/182 (98%) 

MSGAPTAGAALMLC/VATAVLLSARSGPVQSKSPRFASWDEMNVLAHGLLQLGQGLREHAE 60 

IMMIMMMMMMIIIII^ I M M I II II I II M I II II II I M 1 1 1 1 1 1 1 1 1 1 _ 



45 Ouerv- 61 RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRIQQLF 120 

IIIIIIIIIIIIMIIIMIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRVDPEVIiHSLQTQLKAQNSRIQQLF 120 



hkvaqqqrhlekqhlriqhlqsqfglldhkhldhevakparrkrlpemaqpvdpahnvsr 

50 MIIMMMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII Mill 

hkvaqqqrhlekqhlriqhlqsqfglldhkhldhevakparrkrlpemaqpvdpphnvsr 



Identities 


Query: 


1 


Sb j Ct : 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query : 


181 


Sb j ct : 


181 


Score 


= 670 


Identities = 


Query: 


177 


Sbjct: 


215 


Query : 


237 


Sbjct : 


275 



180 
180 



LH 



tpect = 4.9e-194, Sum P(3) = 4.9e-194 
Positives = 124/132 (93%) 



60 Ouerv 177 NVSRLHHGGWTVIQRRHDGSMDFNRPWEAYKAGFGDPHGEFWLGLEKVHSITGDRNSRLA 236 

I IIIIIIIIMIII^MIMIIIMIIIMIIIIIIIIMIIMI IIIIIIM 

NCKMTSDGGWTVIQRRHDGSVDFNRPWEAYKAGFGDPHGEFWLGLEKVHSIMGDRNSRLA 274 

VQLRDWDGNAELLQFSVHLGGEDTAYSLQLTAPVAGQLGATTVPPSGLSVPFSTWDQDHD 2 96 

65 M 1 1 II II M 1 1 II 1 1 1 M II II 1 1 1 M I II I II I II 1 1 M M 1 1 1 M 1 1 1 1 1 1 Ml 1 1 

VQLRDWDGNAELLQFSVHLGGEDTAYSLQFTAPVAGQLGATTVPPSGLSVPFSTWDQDHD 

124 
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60 



Query: 


297 


Sb j ct : 


335 


Score 


=» 331 


Identities 


Query: 


326 


Sbjct : 


346 


Query: 


386 


Sbjct: 


406 


Score 


=» 46 



llllllllllll 

XiRRDKNCAKSLS 346 

(116.5 bits). Expect = 4.9e-194, Sum P(3) = 4,9e-194 
= 59/61 (96%), Positives 60/61 (98%) 

AGGWWFGTCSHSNLNGQYFRSIPQQRQKLKKGIFWKTWRGRYYPLQATTMLIQPMAAEAA 

HIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII llllllllllllllll 
SGGWWFGTCSHSNLNGQYFRSIPQQRQKLKKGIFWKTWRGRYYSLQATTMLIQPMAAEAA 

S 386 
I 

S 406 

[16.2 bits), Expect » 5.9e-33, Sum P(2) o 5.9e-33 
Identities = 14/40 (35%), Positives = 19/40 (47%) 



Query: 255 LGGEDTA-YSLQLTAPVAGQLGATTVPPSGLSVPFSTWDQ 293 

20 + I II ^1 I I I I I I I h-^lh 

Sbjct: 1 MSGAPTAGAALMLCAATAVLLSAQGGPVQSKSPRFASWDE 40 



Score = 45 (15.8 bits). Expect = 7.6e-33, Sum P(2) = 7.6e-33 
Identities = 13/40 (32%), Positives = 19/40 (47%) 

Query: 1 MSGAPTAGAALMLCAATAVLLSARSGPVQSKSPRFASWDE 40 

^ I II +1 I I II ^ I I I^Hh 

Sbjct: 293 LGGEDTA-YSLQFTAPVAGQLGATTVPPSGLSVPFSTWDQ 331 



30 Table 3 1 . BLASTN identity search of CuraGen Corporation's Human SeqCalling 

database using CuraGen Acc. No. CG57051-02. 

>s3aq: 162377751 Category D: , 1920 bp. (Seq id no:96) 
Length 1920 

35 Minus Strand HSPs : 

Score = 3448 (517.3 bits), Expect = 1.5e-233, Sum P(2) = 1.5e-233 

696/700 (99%), Positives = 696/700 (99%), Strand = Minus / Plus 

40 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
GGTGCAGGCGGCTGACATTGTGAGCCGGGTCAACTGGCTGGGCCATCTCGGGCAGCCTCT 

TTCTTCGGGCAGGTTTGGCCACCTCATGGTCTAGGTGCTTGTGGTCCAGGAGGCCAAACT 

45 ' 1 1 1 1 1 1 1 1 1 II 1 1 I II II II I II 1 1 1 II M 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 

rTCTTCGGGCAGGCTTGGCCACCTCATGGTCTAGGTGCTTGTGGTCCAGGAGGCCAAACT 

GGCTTTGCAGATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCA 

IIMIIIIMIIMIIillillllllllllllllllllllllMIIIIIIIIMIIIIII 
SGCTTTGCAGATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCA 

CCTTGTGGAAGAGTTGCTGGATCCTGCTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGT 

IIIIIIIIIIMII llllllllllllllllllllllllllllll llllllllllllllll 
CCTTGTGGAAGAGTTGCTGGATCCTGCTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGT 

GAAGGACCTCAGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTC6G 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIII 
SAAGGACCTCAGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGG 

TTCCCTGACAGGCGGACCCGCACGCGCTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGC 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIINIIIMIMIMIII 



Identities 


Query: 


701 


Sbjct: 


1221 


Query: 


641 


Sbjct: 


1281 


Query: 


581 


Sb j ct : 


1341 


Query: 


521 


Sbjct: 


1401 


Query: 


461 


Sb j ct : 


1461 


Query: 


401 


Sbjct: 


1521 


Query : 


341 



341 GGGTGCGCTCCGCGTGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGA 282 
65 ~ I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
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10 



15 



20 



Sb j Ct : 


1581 


Query : 


281 


Sbj ct : 


1641 


Query : 


222 


Sbjct : 


1700 


Query : 


162 


Sbj ct : 


1760 


Query: 


102 


Sbjct: 


1820 


Query: 


42 


Sbj ct : 


1880 



GGGTGCGCTCCGCX3TGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGA 1640 

CATTCATCTCGTCCCAGGACGCAAAGCGCGGCGACTTGGACTGCACGGGTCCAGATCT -A 223 

llllllllllllllllllllllllllllllllllllllllllllllllllll I II I 
CATTCATCTCGTCCCAGGACGCAAAGCGCGGCGACTTGGACTGCACGGGTCC-GCCCTGA 1699 

GCGCTCAGTAGCACGGCGGTGGCGGCGCAGAGCATCAGGGCTGCCCCGGCCGTCGGAGCA 163 

Mllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

GCGCTCAGTAGCACGGCGGTGGCGGCGCAGAGCATCAGGGCTGCCCCGGCCGTCGGAGCA 1759 



Mill 



HIM 



IIIIIMIIIIIIIIIIIIIIIIII 



43 



TTCCAGGTGCGAGGACTGGAGACGCGGAGGACCGGGGGTAAGACCCGCTTGGTTGCAGAA 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
TTCCAGGTGCGAGGACTGGAGACGCGGAGGACCGGGGGTAAGACCCGCTTGGTTGCAGAA 1879 



GCCGCTGGAAAGAATCGGATCACAGTCGTGTGAGGATCCGC 

Mlllllllllllllllllllllllllllllllllllllll 
GCCGCTGGAAAGAATCGGATCACAGTCGTGTGAGGATCCXSC 



1920 



25 



Score = 1887 (283.1 bits). Expect = 1.5e-233, Sum P(2) = l.Se-233 
Identities = 399/415 (96%), Positives = 399/415 (96%). Strand = Minus / Plus 

ATGG - T- CAGGTCTTTGAGCCACCGATGGGGCAGAGAGGCTCTTGGCGCAGTTCTTGTCC 1051 

III! I Mill I Mill I IIIIMIIIIIIMIIIIIIIIIIIIMI 

ATGGCTGCAGGTGCCAAA- CCACC- AGCCTCCA6AGAGGCTCTTGGCGCAGTTCTTGTCC 7 57 

CTGCGGAGGTCGTGATCCTGGTCCCAAGTGGAGAAGGGTACGGAGAGGCCGCTGGGTGGG 991 

IIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIII 

CTGCGGAGGTCGTGATCCTGGTCCCAAGTGGAGAAGGGTACGGAGAGGCCGCTGGGTGGG 817 

ACGGTGGTGGCGCCCAGCTGGCCGGCCACGGGTGCAGTGAGCTGCAGGCTATAGGCCGTG 931 

IIMMIIMMMIMIMIMMIIIIMIIIIIIIIIIIIMIIIIMIII III I II 

ACGGTGGTGGCGCCCAGCTGGCCGGCCACGGGTGCAGTGAGCTGCAGGCTATAGGCCGTG 877 

TCCTCGCCACCCAGGTGCACGGAGAACTGCAGCAACTCGGCGTTGCCATCCCAGTCCCGC 871 

IMIIIillllMIIMIIMIMMMMMMMIlllMMIMMMMIIIMM 

TCCTCGCCACCCAGGTGCACGGAGAACTGCAGCAACTCGGCGTTGCCATCCCAGTCCCGC 937 

AGCTGCACGGCCAGGCGGCTGTTGCGGTCCCCCGTGATGCTATGCACCTTCTCCAGACCC 811 

IIIMIIIIIIIIIIIIIIIIIMIIIIIIIIMIIIIIIIIIIIIIIIIilMIIIIII 
AGCTGCACGGCCAGGCGGCTGTTGCGGTCCCCCGTGATGCTATGCACCTTCTCCAGACCC 997 

AGCCAGAACTCGCCGTGGGGATCCCCAAACCCCGCCTTGTAGGCTTCCCAGGGCCGGTTG 751 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIII IIMIIIIIIIIIIMIIIIII 

AGCCAGAACTCGCCGTGGGGATCCCCAAACCCCGCCTTGTAGGCTTCCCAGGGCCGGTTG 1057 





Query: 


1108 




Sbj ct : 


700 


30 


Query : 


1050 




Sbjct: 


758 


35 


Query: 


990 




Sbjct: 


818 




Query : 


930 


40 


Sbjct: 


878 




Query : 


870 


45 


Sbjct: 


938 




Query : 


810 




Sbjct: 


998 


50 


Query : 


750 




Sbjct : 


1058 




Score 


= 936 


55 


Identities ' 




Query : 


1315 


60 


Sbjct: 


547 




Query: 


1255 




Sbjct : 


607 


65 


Query: 


1195 




Sbj ct : 


667 


70 


Query: 


1135 



MIIIIMIIIIIIIMIMMMIIIIIMIIIIIIM II Ml 



(140.4 bits). Expect =« l.le-190. Sum P(2) 
: 312/407 (76%), Positives « 312/407 (76%), 



= l.le-190 

Strand = Minus / Plus 
1256 



CTAGGAGGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTA 

MIIIIIIIIMIIIIIIMIIIIIIIMIIIIMIIIIIIIIIIIMIIIIIIMIMI 

CTAGGAGGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTA 606 



MMMMMMIMIIIMIIIIIIMMIIIMIIMIIMIIMMIMMIIIIII 



GATGGAGCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCC 1136 

IIMMIMIMM Mill Mill Ml Ml Mill MMMMMMIMMMMIIM 

GATGGAGCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCC 726 



TCCTGCCGGGGTCAGGG - G - AGAGG - 
Ml I M II M I II 



- GAACATGGTCAGGTCTTTGAGCCA CCGATG 1083 

I I II MM! Mill MM 
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15 



Sbjct: 727 TCOVGAGAGGCTCTTGGCGCAGTTCTTGTCCCTGCGGAGGTCGT-GATCCTGGTCCCAAG 785 

Query: 1082 GGGCAGAGAGGCTCTTGGCGCAGTTCTTGTCCCTGCGGA-GGTCGTGAT-CCTGGTCCCA 1025 

II III III I II I II I I II III III III II I I 
Sbjct: 786 TGG-AGA-AGGGTAC-GGAG-AGGCCGC-TGGGTG-GGACGGTGGTGGCGCCCAG--CTG 837 

Query: 1024 AGTGGAGAAGGGTACGGAGAGGCCGCTGGGTG- -GGACG-GTGGTGGCG-CCCAGCTGGC 969 

II I Mil I Mill II II I II II II I II Hill II 
Sbjct: 838 GCCGGCCACGGGTGCAGTGAG-CTGCAGGCTATAGGCCGTGTCCTCGCCACCCAGGTGCA 896 

Query: 968 CGGCCACGGGTGCAGTGAGCTGCAGGCTATAGGCCGTGTCCTCGCCACCCAGGTGCACGG 909 

III I Mill I 11 I III I Ml III III Ml MIMM 

Sbjct: 897 CGGAGAAC--TGCAGCAA-CT-C-GGCGTT--GCCATC-CCAGTCC-CGCAGCTGCACGG 947 



Table 32. ClustalW alignment of CG5705I-02 protein with related proteins. 



CG57051_02 
Q9NZU4 

CG5705l_02 
Q9NZU4 

CG5705l_02 
Q9NZU4 

CG57051_02 
Q9NZU4 

CG5705l_02 
Q9NZU4 

CG5705l_02 
Q9NZU4 

CG5705l_02 
Q9NZU4 

CG57Q51„02 
Q9NZU4 



vISGAPTA GAALMLCAATAVLLSA^GPVQSiCSPRFASWDEMNVLAHGLLQLGQGLREHAE 

visGAPTA gaalmlcaatavllsaSSgpvqsksprfaswdemnvlahgllqlgoglrehae 



RTRSQL S ALERRLSACGSACQGTEGSTDLPLAPESRVDPE VLH S LQTQLKAQN S R I QQL F 
RTRSQL S ALERRLSACGSACQGTEGSTDLPLAPESRVDPE VLH S LQTQLKAQNSR IQQLF 



HKVAQQQ RHLEKQHLR IQHLQSQFGLLDHKHLDHEVAKPA RRKRLPEMAQPVDPgHN VSR 
H KVAQQQ RHLE KQHL R I QHLQSQFGL LDH KHLDHE VAKP A RRKRLPEMAQP VDPiaHN V S R 



LHRLPRD CQEL FQVGERQSGLFE IQPQG S PPFLVHCK^IT S 



AYKAG FGDPHGE FWLGLEKVHS I li^D RNS RLAVQL RDWDGN A ELLQFS VHL GGEDT AY 
VE AYKA G F GDP HGE FWLGLEKVHS I B^D RNS RL A VQL R DWD GNAELL Q FSVHLGG EDT AY 



SLQWTAP VAGQLGATTVPPSGLSVPFSTWDQDHDLRRDKN C AK.S LS APS VAQRPDHVP S P 
S LqHtA P V AGQL GATT VPPS GLS VP FS TWDQDHDLRRD KN C AKS LS 



LTPAGGWWFGTCSHS NLKGQYFRS I PQQRQKLKKGI FWKT WRGRYYgLQATTML I QPMAA 
3GWWFGTC SHS NLNGQYFRS I P QQ RQKLKKG I FWKT WG R YYgL OATTML I QPMAA 



20 



Information for the ClustalW proteins: 



Accno 
CG57051_02 
Q9NZU4 



Common Name 

novel Angiopoietin-like protein 

HEPATIC ANGIOPOIETIN-RELATED 
PROTEIN. 



Length 

386 

406 



In the alignment shown above, black outlined amino acid residues indicate residues 
identically conserved between sequences (i.e., residues that may be required to preserve 
structural or functional properties); amino acid residues with a gray background are similar to 
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one another between sequences, possessing comparable physical and/or chemical properties 
without altering protein structure or function (e.g. the group L,V, I, and M may be considered 
similar); and amino acid residues with a white background are neither conserved nor similar 
between sequences. 



10 



15 



Table 33. PSORT, Signal? and hydropathy results for CuraGen Acc. No. CG57051-02. 



endoplasmic reticulum (membrane) Certainty=0 . 8200 (Affirmative) 

microbody (peroxisome) Certainty=0 . 3008 (Affirmative) 

plasma membrane certainty=0 . 1900 (Affirmative) 

endoplasmic reticulum (lumen) --- Certainty=0. 1000 (Affirmative) 



suco 
suco 
suco 
suco 



INTEGRAL Lilcelihood -4.04 Transmembrane 7-23 (4 

Seems to be a Type lb (Nexo Ccyt) membrane protein 
Is the sequence a signal peptide? 

# Measure Position Value Cutoff Conclusion 

31 0.427 0.37 YES 

31 0.473 0.34 YES 

8 0.952 0.88 YES 

1-30 0.738 0.48 YES 



25) 



max. C 
max . Y 
max. S 
mean S 



20 # Most likely cleavage site between pos. 30 and 31: VQS-KS 



Hydropathy Plo-t plot for 0657831 92 with a wlndou of 19 




150 see 250 

^mino Aoid Number 



400 



SECP 18 
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A SECP18 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:56) and encoded polypeptide sequence (SEQ ID NO:57) of clone 

CG57051-03 directed toward novel Angiopoietin-like proteins and nucleic acids 
encoding them. Figure 23 illustrates the nucleic acid sequence and amino acid sequences 

5 respectively. This clone includes a nucleotide sequence (SEQ ID NO:56) of 1 1 50 bp. The 
nucleotide sequence includes an open reading frame (ORF) beginning with an ATG initiation 
codon at nucleotides 44-46 and ending with a TAG stop codon at nucleotides 1 148-1 1 50. 
Putative untranslated regions, if any, are found upstream from the initiation codon and 
downstream from the termination codon. The encoded protein having 368 amino acid residues is 

10 presented using the one-letter code in Figure 23. 

The protein encoded by clone CG57051-03 is predicted by the PSORT program to be 
located extracellularly with a certainty of 0.7332 and has a signal peptide (see Table 38 below). 
The PGR product derived by exon linking, covering the entire open reading frame, was cloned 
into the pCR2.1 vector from Invitrogen to provide clone 1 34276:: 130294: :PPAR- 
15 gamma.698782. PI 5. The DNA and protein sequences for the novel Angiopoietin-like gene are 
reported here as CuraGen Acc. No. CG57051-03. 

Similarities 

In a search of sequence databases, it was found, for example, that the nucleic acid 
sequence of this invention has 837 of 1031 bases (81%) identical to a gb:GENBANK- 

20 ID:AF202636|acc:AF202636.1 mRNA from Homo sapiens (Homo sapiens angiopoietin-like 

protein PPl 158 mRNA, complete cds) (Table 34). The full amino acid sequence of the protein of 
the invention was found to have 184 of 192 amino acid residues (95%) identical to, and 184 of 
192 amino acid residues (95%) similar to, the 406 amino acid residue ptnr:SPTREMBL- 
ACC:Q9HBV4 protein from Homo sapiens (Human) (ANGIOPOIETIN-LIKE PROTEIN 

25 PPl 158) (Table 35). 

A multiple sequence alignment is given in Table 37, with the protein of the invention 
being shown on the first line in a ClustalW analysis comparing the protein of the invention with 
related protein sequences. Please note this sequence represents a splice form of Angiopoietin as 
indicated in positions 183 to 221. 
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The presence of identifiable domains in the protein disclosed herein was determined by 
searches versus domain databases such as Pfam, PROSITE, ProDom, Blocks or Prints and then 
identified by the Interpro domain accession number. Significant domains are summarized below: 

Model Domain seq-f seq-t hinm-f hinm-t score E-value 

fitorinogen_C 1/2 184 246 47 123 . . 102.6 2.2e-28 

fitorinogen^C 2/2 288 362 178 272 .] 61.3 1.4e-16 

5 IPR002 181; (Fibrinogen^C) 

Fibrinogen, the principal protein of vertebrate blood clotting is an hexamer containing 
two sets of three different chains (alpha, beta, and gamma), linked to each other by disulfide 
bonds. The N-terminal sections of these three chains are evolutionary related and contain the 
cysteines that participate in the cross-linking of the chains. However, there is no similarity 
10 between the C-terminal part of the alpha chain and that of the beta and gamma chains. The C- 
terminal part of the beta and gamma chains forms a domain of about 270 amino-acid residues. 
As shown in the schematic representation this domain contains four conserved cysteines 
involved in two disulfide bonds. 

15 

xxxxCxxxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCxxxxxCxxxxxxxxxxx 

II II 

+. + + — + 

20 *C': conserved cysteine involved in a disulfide bond. 

Such a domain has been recently found in other proteins which are listed below: 

1) Two sea cucumber fibrinogen-like proteins (FReP-A and FReP-B). These are proteins, 
25 of about 260 amino acids, which have a fibrinogen beta/gamma C-terminal domain. 

2) In the C-terminus of Drosophila protein scabrous (gene sea). Scabrous is involved in 
the regulation of neurogenesis in Drosophila and may encode a lateral inhibitor of R8 cells 
differentiation. 

3) In the C-terminus of a mammalian T-cell specific protein of unknown function. 
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4) In the C-terminus of a human protein of unknown function which is encoded on the 
opposite strand of the steroid 21-hydroxylase/complement component C4 gene locus. 

The function of this domain is not yet known, but it has been suggested that it could be 
5 involved in protein-protein interactions. 

This indicates that the sequence of the invention has properties similar to those of other 
proteins known to contain this/these domain(s) and similar to the properties of these domains. 

Chromosomal information: 

The Angiopoietin-like gene disclosed in this invention maps to chromosome 19pl3.3. 
10 This assignment was made using mapping information associated with genomic clones, public 
genes and ESTs sharing sequence identity with the disclosed sequence and CuraGen 
Corporation's Electronic Northern bioinformatic tool. 

Tissue expression 

The Angiopoietin-like gene disclosed in this invention is expressed in at least the 
following tissues: Adipose, Liver, Placenta. Expression information was derived from the tissue 
sources of the sequences that were included in the derivation of the sequence of CuraGen Acc. 
No. CG57051-03. 

Cellular Localization and Sorting 

The PSORT, SignalP and hydropathy profile for the Angiopoietin-like protein are shown 
in Table 38. The results predict that this sequence has a signal peptide and is likely to be 
localized extracellularly with a certainty of 0.7332. The signal peptide is predicted by SignalP to 
be cleaved at amino acid 25 and 26: AQG-GP. 

Functional Variants and Homologs 

The novel nucleic acid of the invention encoding a Angiopoietin-like protein includes the 
nucleic acid whose sequence is provided in Figure 23, or a fragment thereof The invention also 
includes a mutant or variant nucleic acid any of whose bases may be changed from the 
corresponding base shown in Figure 23 while still encoding a protein that maintains its 
Angiopoietin-like activities and physiological functions, or a fragment of such a nucleic acid. 
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The invention further includes nucleic acids whose sequences are complementary to the 
sequence of CuraGen Acc. No. CG57051-03, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 

5 chemical modifications. Such modifications include, by way of non-limiting example, modified 
bases, and nucleic acids whose sugar phosphate backbones are modified or derivatized. These 
modifications are carried out at least in part to enhance the chemical stability of the modified 
nucleic acid, such that they may be used, for example, as antiscnse binding nucleic acids in 
therapeutic applications in a subject. In the mutant or variant nucleic acids, and their 

10 complements, up to about 19% of the bases may be so changed. 

The novel protein of the invention includes the Angiopoietin-like protein whose sequence 
is provided in Figure 23. The invention also includes a mutant or variant protein any of whose 
residues may be changed from the corresponding residue shown in Figure 23 while still encoding 
a protein that maintains its Angiopoietin-like activities and physiological functions, or a 
1 5 functional fragment thereof. In the mutant or variant protein, up to about 5% of the amino acid 
residues may be so changed. 

Chimeric and Fusion Proteins 

The present invention includes chimeric or fusion proteins of the Angiopoietin-like 
protein, in which the Angiopoietin-like protein of the present invention is joined to a second 

20 polypeptide or protein that is not substantially homologous to the present novel protein. The 

second polypeptide can be fused to either the amino-terminus or carboxyl-terminus of the present 
CG57051-03 polypeptide. In certain embodiments a third nonhomologous polypeptide or protein 
may also be fused to the novel Angiopoietin-like protein such that the second nonhomologous 
polypeptide or protein is joined at the amino terminus, and the third nonhomologous polypeptide 

25 or protein is joined at the carboxyl terminus, of the CG5705 1 -03 polypeptide. Examples of 

nonhomologous sequences that may be incorporated as either a second or third polypeptide or 
protein include glutathione S-transferase, a heterologous signal sequence fused at the amino 
terminus of the Angiopoietin-like protein, an immunoglobulin sequence or domain, a serum 
protein or domain thereof (such as a serum albumin), an antigenic epitope, and a specificity 

30 motif such as (His)6. 

The invention further includes nucleic acids encoding any of the chimeric or fusion 
proteins described in the preceding paragraph. 
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Antibodies 

The invention further encompasses antibodies and antibody fragments, such as Fab, 
(Fab)2 or single chain FV constructs, that bind immunospecifically to any of the proteins of the 
invention. Also encompassed within the invention are peptides and polypeptides comprising 
5 sequences having high binding affinity for any of the proteins of the invention, including such 
peptides and polypeptides that are fused to any carrier particle (or biologically expressed on the 
surface of a carrier) such as a bacteriophage particle. 

Uses of the Compositions of the Invention 

The protein similarity information, expression pattern, cellular localization, and map 
10 location for the protein and nucleic acid disclosed herein suggest that this Angiopoietin-like 
protein may have important structural and/or physiological functions characteristic of the 
Fibrinogen family. Therefore, the nucleic acids and proteins of the invention are useful in 
potential diagnostic and therapeutic applications and as a research tool. These include serving as 
a specific or selective nucleic acid or protein diagnostic and/or prognostic marker, wherein the 
15 presence or amount of the nucleic acid or the protein are to be assessed. These also include 
potential therapeutic applications such as the following: (i) a protein therapeutic, (ii) a small 
molecule drug target, (iii) an antibody target (therapeutic, diagnostic, drug targeting/cytotoxic 
antibody), (iv) a nucleic acid useful in gene therapy (gene delivery/gene ablation), (v) an agent 
promoting tissue regeneration in vitro and in vivo, and (vi) a biological defense weapon. 

20 The nucleic acids and proteins of the invention have applications in the diagnosis and/or 

treatment of various diseases and disorders. For example, the compositions of the present 
invention will have efficacy for the treatment of patients suffering from: type II diabetes, obesity, 
colon cancer, diabetes mellitus, insulin-resistant, with acanthosis nigricans and hypertension, 3- 
methylglutaconicaciduria, type III; Cone-rod retinal dystrophy-2;DNA ligase I deficiency; 

25 Glutaricaciduria, type IIB Liposarcoma; Myotonic dystrophy as well as other diseases, disorders 
and conditions. 

These materials are further useful in the generation of antibodies that bind 
immunospecifically to the novel substances of the invention for use in diagnostic and/or 
therapeutic methods. 
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Table 34. BLASTN search using CuraGen Acc. No. CG57051-03. 

>gb:GENBANK-ID:AF202636|acc:AF202636.1 Homo sapiens angiopoieCin-like protein 
PP1158 mRNA, complete cds - Homo sapiens, 1943 bp. {SEQ ID NOi97) 
Length = 1943 

Plus Strand HSPs: 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



Score = 2967 (445.2 bits). Expect = 3.2e 
Identities = 837/1031 (81%), Positives = 



•128, P = 3.2e-128 
837/1031 (81%) , Strand 



Plus / Plus 



Query : 


1 


Sb J Ct^ : 


X J u 


Que icy : 


6 1 


Sbj Ct, : 


190 


Query : 


121 


Sbjct : 


250 


Query : 


181 


Sbjct: : 


310 


Query : 




SD^ Cu : 


370 


Query : 




Sbj ct : 


J U 


Query : 


J O X 


0£)J cc . 


490 




421 


Sbj Ct : 


550 




481 


Sbj ct : 


610 


Query : 


541 


Sbjct : 


670 


Query : 


597 


Sbjct: 


730 


Query: 


653 


Sbjct : 


789 


Query: 


711 


Sbjct : 


848 


Query: 


771 


Sbj ct : 


897 


Query : 


823 



CCCCGAGAGTCCCCGAATCCCCGCTCCCAGGCTACCTAAGAGGATGAGCGGTGCTCCGAC 

IIIIIIIIIIMIIIIIIMIIIIIIIIIIIIIIIIIIIIIIMIMIMIIIIIIIIM 

CCCCGAGAGTCCCCGAATCCCCGCTCCCAGGCTACCTAAGAGGATGAGCGGTGCTCCGAC 

GGCCGGGGCAGCCCTGATGCTCTGCGCCGCCACCGCCGTGCTACTGAGCGCTCAGGGCGG 

IIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIMIIIIIIIIIII 

GGCCGGGGCAGCCCTGATGCTCTGCGCCGCCACCGCCGTGCTACTGAGCGCTCAGGGCGG 

ACCCGTGCAGTCCAAGTCGCCGCGCTTTGCGTCCTGGGACGAGATGAATGTCCTGGCGCA 

IIIIIIIIIIIMIIIIIIllllMIIIIMIIIillllMlllllllllllllllllll 

ACCCGTGCAGTCCAAGTCGCCGCGCTTTGCGTCCTGGGACGAGATGAATGTCCTGGCGCA 

CGGACTCCTGCAGCTCGGCCAGGGGCTGCGCGAACACGCGGAGCGCACCCGCAGTCAGCT 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
CGGACTCCTGCAGCTCGGCCAGGGGCTGCGCGAACACGCGGAGCGCACCCGCAGTCAGCT 

GAGCGCGCTGGAGCGGCGCCTGAGCGCGTGCGGGTCCGCCTGTCAGGGAACCGAGGGGTC 

IIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIII 

GAGCGCGCTGGAGCGGCGCCTGAGCGCGTGCGGGTCCGCCTGTCAGGGAACCGAGGGGTC 
CACCGACCTCCCGTTAGCCCCTGAGAGCCGGGTGGACCCTGAGGTCCTTCACAGCCTGCA 

IIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIM 

CACCGACCTCCCGTTAGCCCCTGAGAGCCGGGTGGACCCTGAGGTCCTTCACAGCCTGCA 
GACACAACTCAAGGCTCAGAACAGCAGGATCCAGCAACTCTTCCACAAGGTGGCCCAGCA 

IIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIMMIIIIIII 

GACACAACTCAAGGCTCAG/^CAGCAGGATCCAGCAACTCTTCCACAAGGTGGCCCAGCA 
GCAGCGGCACCTGGAGAAGCAGCACCTGCGAATTCAGCATCTGCAAAGCCAGTTTGGCCT 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIMIIII 

GCAGCGGCACCTGGAGAAGCAGCACCTGCGAATTCAGCATCTGCAAAGCCAGTTTGGCCT 
CCTGGACCACAAGCACCTAGACCATGAGGTGGCCAAGCCTGCCCGAAGAAAGAGGCTGCC 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CCTGGACCACAAGCACCTAGACCATGAGGTGGCCAAGCCTGCCCGAAGAAAGAGGCTGCC 

CGAGATGGCCCAGCCAGTTGACCCGGCTCACAATGTCAGCCGCCTGCACCA- -TGG- -AG 

llllllllllllllllllllllllllllllllllllllllllllllllll II II 
CGAGATGGCCCAGCCAGTTGACCCGGCTCACAATGTCAGCCGCCTGCACCGGCTGCCCAG 



60 



189 



120 



249 



180 



309 



240 



369 



300 



429 



360 



489 



420 



549 



480 



609 



540 



669 



596 



729 



III! II III I 



II I 



llllllll 



II I 



GA 



' AGCCTACAAGGCGGGGTTTGGGGATCCCCACGGCGAGTTCTGGCTGG- GTCTGGAGA 710 

II I II Mill II I II I II II I III I mil I 



I I II II I II I II I I II III I I II I III HIM 



I I MM II I III I I I I I II II I MM III 



III I 
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JL y yi ""ll-Sni fe ^ 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



Sbj ct : 


955 < 


Query: 


879 ' 


Sbjct : 


1015 ' 


Query: 


939 < 


Sbjct : 


1066 < 


Query: 


996 . 


Sbj ct : 


1121 ( 


Score 


o 2774 


Identities = 


Query: 


583 


Sbjct: 


828 


Query: 


643 


Sbj ct : 


886 


Query : 


703 


Sbjct: 


946 


Query : 


763 


Sbjct : 


1006 


Query : 


823 


Sbjct : 


1066 


Query: 


883 


Sbjct: 


1126 


Query : 


943 


Sbjct : 


1186 


Query: 


1003 


Sbjct: 


1246 


Query: 


1063 


Sbjct: 


1306 


Query: 


1123 


Sbj ct : 


1366 



II lllll III II Mil I Ml nil II I III II 

GGTGCATAGCATCACGGGGGACCGCAACAGCCGCCTGGCCGTGCAGCTGCGGGACTGGGA 1014 
TCCCACCCAGCGGCCTCTCCGTACCCTTCCCCACTTGGGACCAGGATCACGACCTCCGCA 938 

I II I III I I I M III II II III II I III II 

TGGCAAC- -GCCGAGT-TGC-TGCAGTTCTCCG- -TGC-ACCTGGGTGGCGAGGACA-C- 1065 
GGGACA - AGAACTGC - GCCAAGAGCCTCTCTGGAGGCTGGTG- GTTTGGCACCTGCAGCC 995 

II I II nil II I II I I I III II II III II II II 

GGCCTATAGC - CTGCAGCTCACTGC7VCC - C - - GTGGCCGGCCAGCTGGGCGCCACCA - CC 1120 



I III II II lllll I I II II I III 



1.6e-119, P = 1.6e-119 



Plus / Plus 



CCTGCACCATGGAGGCTGGACAGTAATTCAGAGGCGCCACGATGGCTCAGTGGACTTCAA 

III II llllllllllllllllllllllllllllllllllllllllllllllllllll 
CCT - GAG- ATGGAGGCTGGACAGTAATTCA6AGGCGCCACGATGGCTCAGTGGACTTCAA 

CCGGCCCTGGGAAGCCTACAAGGCGGGGTTTGGGGATCCCCACGGCGAGTTCTGGCTGGG 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
CCGGCCCTGGGAAGCCTACAAGGCGGGGTTTGGGGATCCCCACGGCGAGTTCTGGCTGGG 



642 



885 



702 



945 



TCTGGAGAAGGTCCATAGCATCACGGGGGACCGCAACAGCCGCCTGGCCGTGCAGCTGCG 762 

llllllllllll MIIIIIIIIIIIIIIMIIIIMIIIIIIIIIIIIIIIIIIIIIII 

TCTGGAGAAGGTGCATAGCATCACGGGGGACCGCAACAGCCGCCTGGCCGTGCAGCTGCG 1005 

GGACTGGGATGACAACGCCGAGTTGCTGCAGTTCTCCGTGCACCTGGGTGGCGAGGACAC 822 

lllllllllll IIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIM 

GGACTGGGATGGCAACGCCGAGTTGCTGCAGTTCTCCGTGCACCTGGGTGGCGAGGACTVC 1065 

GGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCGGCCAGCTGGGCGCCACCACCGTCCC 882 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIII 

GGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCGGCCAGCTGGGCGCCACCACCGTCCC 112 5 



ACCCAGCGGCCTCTCCGTACCCTTCCCCACTTGGGACCAGGATCACGACCTCCGCAGGGA 

lllllllllllllllllllllllll IIIIIIIMIIIMIMIIIIIIIIMIIIIIII 



942 



1185 



CAAGAACTGCGCCAAGAGCCTCTCTGGAGGCTGGTGGTTTGGCACCTGCAGCCATTCCAA 1002 

IIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CAAGAACTGCGCCAAGAGCCTCTCTGGAGGCTGGTGGTTTGGCACCTGCAGCCATTCCAA 1245 



CCTCAACGGCCAGTACTTCCGCTCCATCCCACAGCAGCGGCAGAAGCTTAAGAAGGGAAT 

IIIIIIIIIIIMIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CCTCAACGGCCAGTACTTCCGCTCCATCCCACAGCAGCGGCAGAAGCTTAAGAAGGGAAT 



1062 



1305 



CTTCTGGAAGACCTGGCGGGGCCGCTACTACCCGCTGCAGGCCACCACCATGTTGATCCA 1122 

IIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CTTCTGGAAGACCTGGCGGGGCCGCTACTACCCGCTGCAGGCCACCACCATGTTGATCCA 1365 

GCCCATGGCAGCAGAGGCAGCCTCCTAG 1150 

IIIIIIIIIIIIIIIIIIMIIIIIIII 
GCCCATGGCAGCAGAGGCAGCCTCCTAG 1393 



60 



Table 35. BLASTP search using the protein of CuraGen Acc. No. CG57051-03. 



65 



>ptnr :SPTREMBL-ACC:Q9HBV4 ANGIOPOIETIN-LIKE PROTEIN PP1158 - Homo sapiens 
(Human), 406 aa . (seq id nO:98) 
Length = 406 

Score = 1009 (355.2 bits). Expect ^ 4.3e-198, Sum P(2) = 4.3e-198 
Identities = 184/192 (95%), Positives = 184/192 (95%) 



135 



NVSRLHHGGWTVIQRRHDGS VDFNRPWEAYKAGFGDPHGEFWLGLEKVHS I TGDRNSRLA 2 36 

I IIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

NCKMTSDGGWTVIQRRHDGSVDFNRPWEAYKAGFGDPHGEFWLGLEKVHS I TGDRNSRLA 274 
VQLRDWDDNAELLQFSVHL6GEDTAYSLQLTAPVAGQLGATTVPPSGLSVPFPTWDQDHD 2 96 

Mlllll llllllllllllllllllllllllllllllllllllllllllll lllllll 

VQLRDWDGNAELLQFSVHLGGEDTAYSLQLTAPVAGQLGATTVPPSGLSVPFSTWDQDHD 334 

10 Query: 297 LRRDKNCAKSLSGGWWFGTCSHSNLNGQYFRSIPQQRQKLKKGIFWKTWRGRYYPIiQATT 356 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
LRRDKNCAKSLSGGWWFGTCSHSNLNGQYPRS I PQQRQKLKKGI FWKTWRGRYYPLQATT 394 

MLIQPM/^EAAS 368 

IS ' llllllllllll 

MLIQPMAAEAAS 4 06 

(328.8 bits), Expect = 4.3e-198, Sum P(2) « 4.3e-198 
» 182/182 (100%), Positives = 182/182 (100%) 

MSGAPTAGAALMLCAATAVLLSAQGGPVQSKSPRFASWDEMNVLAHGLLQLGQGLREHAE 6 0 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
MSGAPTAGAALMLCAATAVLLSAQGGPVQSKSPRFASWDEMNVLAHGLLQLGQGLREHAE 6 0 



20 



35 



40 



Que ry : 


177 


Sbjct : 


215 


Query : 


237 


Sbjct: 


275 


Query: 


297 


Sbjct: 


335 


Que ry : 


357 


Sbjct: 


395 


Score 


= 934 


Identities > 


Query: 


1 


Sbjct : 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sb j Ct : 


121 


Query: 


181 


Sbjct : 


181 



25 Query: 61 RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRIQQLF 120 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMII 

RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRIQQLF 12 0 

HKVAQQQRHLEKQHLRIQHLQSQFGLIiDHKHLDHEVAKPARRKRLPEMAQPVDPAHNVSR 18 0 

30 ~ I I 1 1 1 I 1 1 1 1 1 1 1 1 I 1 1 I I 1 1 1 I 1 1 1 1 I M I I M I 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 I I I I I I 1 1 1 

HKVAQQQRHLEKQHLRIQHLQSQFGLLDHKHLDHEVAKPARRKRLPEMAQPVDPAHNVSR 180 



II 



Table 36. BLASTN identity search of CuraGen Corporation's Human SeqCalling 
database using CuraGen Acc. No. CG5 705 1-03. 

>s3aq: 189266374 Sequence 5 from Patent WO0105825 (AX079971.1: 100%/409, 
p=1.2e-238), 550 bp. (SEQ ID NO:99) 
Length = 550 

45 Plus Strand HSPs : 

Score - 2723 (408.6 bits), Expect = 1.8e-117, P = l,8e-117 

Identities = 547/550 (99%) , Positives = 547/550 (99%) , Strand » Plus / Plus 

50 Query: 45 0 GAATTCAGCATCTGCAAAGCCAGTTTGGCCTCCTGGACCACAAGCACCTAGACCATGAGG 509 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
GAATTCAGCATCTGCAAAGCCAGTTTGGCCTCCTGGACCACAAGCACCTAGACCATGAGG 6 0 

TGGCCAAGCCTGCCCGAAGAAAGAGGCTGCCCGAGATGGCCCAGCCAGTTGACCCGGCTC 569 

55 ~ M 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 

TGGCCAAGCCTGCCCGAAGAAAGAGGCTGCCCGAGATGGCCCAGCCAGTTGACCCGGCTC 120 



65 



Query: 


450 


Sbjct: 


1 


Que ry : 


510 


Sbjct : 


61 


Query: 


570 


Sbjct : 


121 


Query: 


630 


Sbjct : 


181 


Query: 


690 



ACAATGTCAGCCGCCTGCACCATGGAGGCTGGACAGTAATTCAGAGGCGCCACGATGGCT 629 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
60 Sbjct: 121 ACAATGTCAGCCGCCTGCACCATGGAGGCTGGACAGTAATTCAGAGGCGCCACGATGGCT 180 



CAGTGGACTTCAACCGGCCCTGGGAAGCCTACAAGGCGGGGTTTGGGGATCCCCACGGCG 6 89 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIII 

CAGTGGACTTCAACCGGCCCTGGGAAGCCTACAAGGCGGGGTTTGGGGATCCCCACGGCG 24 0 
AGTTCTGGCTGGGTCTGGAGAAGGTCCATAGCATCACGGGGGACCGCAACAGCCGCCTGG 749 
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10 



15 



20 



Sbj Ct : 


241 


Query : 


750 


Sbjct : 


301 


Query: 


810 


Sbjct : 


361 


Query: 


870 


Sbjct : 


421 


Query : 


930 


Sbjct: 


481 


Query: 


990 


Sbjct: 


541 



lllllllllllllllllllllllll IIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIII 

AGTTCTGGCTGGGTCTGGAGAAGGTGCATAGCATCACGGGGGACCGCAACAGCCGCCTGG 3 00 
CCGTGCAGCTGCGGGACTGGGATGACAACGCCGAGTTGCTGCAGTTCTCCGTGCACCTGG 809 

IIIIIMIIIIIIIMMIIIIII lllllllllllllllllllllllllllllllllll 

CCGTGCAGCTGCGGGACTGGGATGGCAACGCCGAGTTGCTGCAGTTCTCCGTGCACCTGG 3 60 
GTGGCGAGGACACGGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCGGCCAGCTGGGCG 869 

MIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMII 

GTGGCGAGGACACGGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCGGCCAGCTGGGCG 420 

CCACCACCGTCCCACCCAGCGGCCTCTCCGTACCCTTCCCCACTTGGGACCAGGATCACG 929 

IIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIII lllllllllllllllllllll 
CCACCACCGTCCCACCCAGCGGCCTCTCCGTACCCTTCTCCACTTGGGACCAGGATCACG 480 

ACCTCCGCAGGGACAAGAACTGCGCCAAGAGCCTCTCTGGAGGCTGGTGGTTTGGCACCT 989 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIilll 
ACCTCCGCAGGGACAAGAACTGCGCCAAGAGCCTCTCTGGAGGCTGGTGGTTTGGCACCT 540 

GCAGCCATTC 999 

llllllllll 
GCAGCCATTC 550 



25 >s3aq: 188990257 Homo sapiens angiopoietin-related protein mRNA, complete cds 
(AF153606.1: 99%/476, p=1.9e-259), 652 bp. (SEQ ID NO:100) 
Length = 652 



30 



Minus Strand HSPs : 

Score = 2403 (360.5 bits). Expect = 4.2e-103, P = 4.2e-103 
Identities = 505/523 (96%), Positives = 505/523 (96%), Strand 



Minus / Plus 



35 



40 



45 



50 



55 



60 



65 



70 



Query : 


520 


Sbjct : 


128 


Query: 


464 


Sbjct: 


186 


Query: 


404 


Sbjct : 


246 


Query : 


344 


Sbj ct : 


306 


Query : 


284 


Sbjct: 


366 


Query: 


224 


Sbjct: 


426 


Query : 


164 


Sbjct: 


486 


Query: 


104 


Sbjct: 


546 


Query: 


44 


Sbjct : 


606 



AGGCTTGGCCACC - TCATGGTCTAGGTG - CTT - GTGGTCCAG - GAGGCCAAACTGGCTTT 4 65 

II I III I II III I II I III II II III lllllll llllllll 
AGCCCTGGTCCCCGTCA - G - TCAATGTGACTGAGTCCGCCATTGAGGCCAGTCTGGCTTT 185 

GCAGATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGT 4 05 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
GCAGATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGT 245 

GGAAGAGTTGCTGGATCCTGCTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGA 345 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
GGAAGAGTTGCTGGATCCTGCTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGA 305 

CCTCAGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCT 2 85 

IIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMI 

CCTCAGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCT 365 

GACAGGCGGACCCGCACGCGCTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGCGGGTGC 225 

IIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIII 
GACAGGCGGACCCGCACGCGCTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGCGGGTGC 425 

GCTCCGCGTGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCA 165 

llllll IIIIMIIMIIIIIMIIII IIIIIMIIIIIIIIIIIIIIIIIIIIMIIM 

GCTCCGCGTGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCA 485 

TCTCGTCCCAGGACGCAAAGCGCGGCGACTTGGACTGCACGGGTCCGCCCTGAGCGCTCA 105 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
TCTCGTCCCAGGACGCAAAGCGCGGCGACTTGGACTGCACGGGTCCGCCCTGAGCGCTCA 545 

GTAGCACGGCG6TGGCGGCGCAGAGCATCAGGGCTGCCCCGGCCGTCGGAGCACCGCTCA 4 5 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIM 

GTAGCACGGCGGTGGCGGCGCAGAGCATCAGGGCTGCCCCGGCCGTCGGAGCACCGCTCA 605 

TCCTCTTAGGTAGCCTGGGAGCGGGGATTCGGGGACTCT-CGGGG 1 

lllllllllllllllllllllllllllllllllllllll Mill 
TCCTCTTAGGTAGCCTGGGAGCGGGGATTCGGGGACTCTTCGGGG 650 
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>s3aq: 16498793 9 Category E: Homo sapiens angiopoiet in-related protein mRNA, 

complete cds (AF153606.1: 100%/150, p=1.9e-084), 228 bp. (SEQ IDNO:101) 
Length = 228 

Minus Strand HSPs: 

Score a 480 (72.0 bits). Expect » 2.7e-31, Sum P(2) = 2.7e-31 

Identities = 96/96 (100%), Positives = 96/96 (100%), Strand = Minus / Plus 

Que r y : 590 GGTGCAGGCGGCTGACATTGTGAGCCGGGTCAACTGGCTGGGCC7VTCTCGGGCAGCCTCT 531 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
Sbjct: 133 GGTGCAGGCGGCTGACATTGTGAGCCGGGTCyVACTGGCTGGGCCATCTCGGGCAGCCTCT 192 

Query: 530 TTCTTCGGGCAGGCTTGGCCACCTCATGGTCTAGGT 495 

llllllllllllllllllllllllllllllllllll 
Sbjct: 193 TTCTTCGGGCAGGCTTGGCCACCTCATGGTCTAGGT 228 

Score = 410 (61.5 bits). Expect = 2.7e-31, Sum P(2) « 2.7e-31 
Identities = 86/91 (94%), Positives = 86/91 (94%), Strand « Minus / Plus 

Query: 681 GGATCCCCARACCCCGCCTTGTAGGCTTCCCAGGGCCGGTTGAAGTCCACTGAGCCATCG 622 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
Sbjct: 1 GGATCCCCAAACCCCGCCTTGTAGGCTTCCCAGGGCCGGTT<3AAGTCCACTGAGCCATCG 60 

Query: 621 TGGCGCCTCTGAATTACTGTCCAGCCTCCAT 591 

IIIMIIIIIIIIIII MINI II I I 

Sbjct: 61 TGGCGCCTCTGAATTAATGTCCACTCTGCCT 91 



Table 37. ClustalW alignment of CG57051-03 protein with related proteins. 



MSGAPTA GAALMLCAATAVLLSAQGGPVQSKS PR FAS WD E M NVL AHG L LQLGQG L R E H AE 
MSGAPTA GAALMLCAATAVLLSAQGGPVQSKSPRFASWDEMNVLAHGLLQLGQGLREHAE 
MSGAPTA GAALMLCAATAVLLSAJaaGPVQSKSPRFASWDEMNVLAHGLLQLGQGLREHAE 



RTRSQLS ALERRLSACGSACQGTEGSTDLPLAPESRVDP EVLHS LQTQLKAQNSR IQQLF 
RTRSQLS ALERRLSACGSACQGTEGSTDL PLAPESRVD PE VLHS LQTQLKAQNSR IQQLF 
RTRSQLS ALERRLS AC GSACQGTEGSTD LP LAP ESRVDP EVLHS LQTQLKAQNSR I OO LF 



HKVAQQQ RHLEKQHLR IQHLQSQFGLLDHKHLDHEVAKPARRKRLPEMAQPVDPAHMVSR 
HKVAQQQ RHLEKQHLR IQHLQSQFGLLDHKHLDHEVAKPARRKRLPEMAQPVDPAHMVSR 
HKVAQQQ RHLEKQHLR I QHLQSQFGLLDH KHLDHEVAK PA R RK R LP EMAQPVDP AHN VS R 



CG570S1.03 

Q9HBV4 

CG57051-02 

CG57051-03 

Q9HBV4 

CG5705I-02 

CG57051-03 

Q9HBV4 

CG57051-02 



CG57051-03 
Q9HBV4 
CG57051-02 

CG57051.03 M 
Q9HBV4 
CG57051-02 



RLPRDCQELFQVGERQSGLFEIQPQGSPPFLVNCKMTS D 



HGGWTVI QRRHDGSVDFNRP 
GGWTVIQRRHDGSVDFNRP 
HGGWTVI QRRHDGSEilDFNRP 



EAYKAG FGDPHGEFWLGLEKVHS I TGD R NS R LAVQL R DWDglNA EL LQ F S VH LGG EDTAV 
^EAYKAG FGDPHGEFWLGLEKVHS I TGD RNS R LAVQL R DWD GN A EL LQ F S VH LGG EDTAY 
^EAYKAG FGDPHGEFWLGLEKVHS I TGD RNS R LAVQL RDWD GN A EL LQ F S VH LGG EDTAY 



CG5705I-03 

Q9HBV4 

CG57051-02 



S LQLTAP VAGQLGATTVPPSGLSVPFEJTWDQDHDLRRDKN OAKS LS 
S LQLTAP VAGQLGATTVPPSGLSVPFSTWDQDHDLRRDKN CAKS LS 
S LQLTAP VAGQLGATTVPPSGLSVPFSTWDQDHDLRRDKN OAK S LS 



APSVAQRPDHVPSP 



CG57051.03 - - - 
Q9HBV4 

CG57051-02 LTPj 



CG57051-03 

Q9HBV4 

CG57051-02 




IGGWWFGTCSHSN LNGQYFRS I PQQRQKLKKG I FWKT WRG R YYP LQATTML I QPMAA 
IgGWWFGTCSHSN LNGQYFRS IPQQRQKLKKGI FWKT WRG R YYP LQATTML I QPMAA 
SgGWWFGTCSHSNLNGQYFRS I PQQRQKLKKG I FWKTWRGR YYP LQATTML I OPIvL^^. 



Information for the ClustalW proteins: 
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Accno Common Name Length 

CG57051-03 novel Angiopoietin-like protein 368 

Q9HBV4 ANGIOPOIETIN-LIKE PROTEIN PPl 158. 406 

CG57051-02 Angiopoietin-like protein- isoform 2 386 

In the alignment shown above, black outlined amino acid residues indicate residues 
identically conserved between sequences (i.e., residues that may be required to preserve 
structural or functional properties); amino acid residues with a gray background are similar to 
one another between sequences, possessing comparable physical and/or chemical properties 
without altering protein structure or function (e.g. the group L,V, I, and M may be considered 
similar); and amino acid residues with a white background are neither conserved nor similar 
between sequences. 

Table 38. PSORT, SignalP and hydropathy results for CuraGen Acc. No. CG57051-03. 



10 outside Certainty^O . 7332 (Affirmative) < suco 

microbody (peroxisome) Certainty=0 . 2527 (Affirmative) < suco 

endoplasmic reticulum (membrane) Certainty=0 . 1000 (Affirmative) < suco 

endoplasmic reticulum (lumen) Certainty=0. 1000 (Affirmative) < suco 



15 



20 



Is the sequence a signal peptide? 

# Measure Position Value Cutoff Conclusion 



max. 


C 


31 


0. 


.306 


0 


.37 


NO 


max. 


Y 


26 


0 , 


.429 


0 


.34 


YES 


max. 


S 


8 


0 . 


.952 


0 


.88 


YES 


mean 


S 


1-25 


0. 


.848 


0 


.48 


YES 



# Most likely cleavage site between pos. 25 and 26: AQ6-GP 
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Hydropathy Plot plot for CG57051-03 with a window of 19 



1.5 - 



0.3 



O 




-0.5 - 



-1.3 - 



130 200 S30 

Amino Aoid Number 



400 



10 



CG57051-04 directed toward novel Angiopoietin-like proteins and nucleic acids 
encoding them. Figure 20 illustrates the nucleic acid sequence and amino acid sequences 
respectively. This clone includes a nucleotide sequence (SEQ ID NO: II) of 937 bp. The 
nucleotide sequence includes an open reading frame (ORF) beginning with an ATG initiation 
codon at nucleotides 155-157 and ending with a TAG stop codon at nucleotides 881-883. 
Putative untranslated regions, if any, are found upstream from the initiation codon and 
downstream from the termination codon. The encoded protein having 242 amino acid residues is 
presented using the one-letter code in Figure 20. The protein encoded by clone CG57051-04 is 
predicted by the PSORT program to be located at the endoplasmic reticulum with a certainty of 
0.8200, and appears to be a signal protein (see Table 27 below). Bottom of Form 



SECP Nucleic Acids 

The novel nucleic acids of the invention include those that encode a SECP or SECP-like 
protein, or biologically-active portions thereof. The nucleic acids include nucleic acids encoding 
15 polypeptides that include the amino acid sequence of one or more of SEQ ID NO: 1,3,5, 7, 9, 

11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56. The encoded polypeptides can thus include, 
e.g.y the amino acid sequences of SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 
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52, 54 and 56.1n some embodiments, a SECP polj^eptide or protein, as disclosed herein, 
includes the product of a naturally-occurring polypeptide, precursor form, pro-protein, or mature 
form of the polypeptide. The naturally-occurring polypeptide, precursor, or pro-protein includes, 
e.g., the full-length gene product, encoded by the corresponding gene. The naturally-occurring 
5 polypeptide also includes the polypeptide, precursor or pro-protein encoded by an open reading 
frame (ORF) described herein. As used herein, the term '^identical" residues corresponds to 
those residues in a comparison between two sequences where the equivalent nucleotide base or 
amino acid residue in an alignment of two sequences is the same residue. Residues are 
alternatively described as "similar" or "positive" when the comparisons between two sequences 
10 in an alignment show that residues in an equivalent position in a comparison are either the same 
amino acid residue or a conserved amino acid residue, as defined below. 

As used herein, a "mature" form of a polypeptide or protein disclosed in the present 
invention is the product of a naturally occurring polypeptide or precursor form or proprotein. 
The naturally occurring polypeptide, precursor or proprotein includes, by way of nonlimiting 

1 5 example, the full length gene product, encoded by the corresponding gene. Alternatively, it may 
be defined as the polypeptide, precursor or proprotein encoded by an open reading frame 
described herein. The product "mature" form arises, again by way of nonlimiting example, as a 
result of one or more naturally occurring processing steps as they may take place within the cell, 
or host cell, in which the gene product arises. Examples of such processing steps leading to a 

20 "mature" form of a polypeptide or protein include the cleavage of the amino-terminal methionine 
residue encoded by the initiation codon of an open reading frame, or the proteolytic cleavage of a 
signal peptide or leader sequence. Thus, a mature form arising from a precursor polypeptide or 
protein that has residues 1 to N, where residue 1 is the amino-terminal methionine, would have 
residues 2 through N remaining after removal of the amino-terminal methionine. Alternatively, a 

25 mature form arising from a precursor polypeptide or protein having residues 1 to N, in which an 
amino-terminal signal sequence from residue 1 to residue M is cleaved, would have the residues 
from residue M+1 to residue N remaining. Further, as used herein, a "mature" form of a 
polypeptide or protein may arise from a step of post-translational modification other than a 
proteolytic cleavage event. Such additional processes include, by way of non-limiting example, 

30 glycosylation, myristoylation or phosphorylation. In general, a mature polypeptide or protein 
may result from the operation of only one of these processes, or a combination of any of them. 

In some embodiments, a nucleic acid encoding a polypeptide having the amino acid 
sequence of one or more ofSEQ IDNO:2, 4, 6, 8, 10, 12, 14, 16, 18,41,43,45,47, 49,51,53, 
55 and 57, includes the nucleic acid sequence of any of SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 



XX, AA, CC, EE, GG, II, KK, MM, and/or OO, or a fragment thereof. Additionally, the 
invention includes mutant or variant nucleic acids of any of SEQ ID NO: 1,3,5, 7, 9, 11, 13, 15, 
1 7, 40, 42, 44, 46, 48, 50, 52, 54 and 56, or a fragment thereof, any of whose bases may be 
changed from the disclosed sequence while still encoding a protein that maintains its SECP-like 
5 biological activities and physiological functions. The invention further includes the complement 
of the nucleic acid sequence of any of SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 
50, 52, 54 and 56, including fragments, derivatives, analogs and homologs thereof. The 
invention additionally includes nucleic acids or nucleic acid fragments, or complements thereto, 
whose structures include chemical modifications. 

10 Also included are nucleic acid fragments sufficient for use as hybridization probes to 

identify SECP-encoding nucleic acids (e.g., SEC? mRNA) and fragments for use as polymerase 
chain reaction (PGR) primers for the amplification or mutation of SECP nucleic acid molecules. 
As used herein, the term "nucleic acid molecule" is intended to include DNA molecules (e.g., 
cDNA or genomic DNA), RNA molecules (e.g., mRNA), analogs of the DNA or RNA generated 

15 using nucleotide analogs, and derivatives, fragments, and homologs thereof. The nucleic acid 
molecule can be single-stranded or double-stranded, but preferably is double-stranded DNA. 

The term "probes" refer to nucleic acid sequences of variable length, preferably between 
at least about 10 nucleotides (nt), 100 nt, or as many as about, e.g., 6,000 nt, depending upon the 
specific use. Probes are used in the detection of identical, similar, or complementary nucleic 
20 acid sequences. Longer length probes are usually obtained from a natural or recombinant source, 
are highly specific and much slower to hybridize than oligomers. Probes may be single- or 
double-stranded, and may also be designed to have specificity in PGR, membrane-based 
hybridization technologies, or ELISA-like technologies. 

The tem "isolated" nucleic acid molecule is a nucleic acid that is separated from other 

25 nucleic acid molecules that are present in the natural source of the nucleic acid. Examples of 

isolated nucleic acid molecules include, but are not limited to, recombinant DNA molecules 

contained in a vector, recombinant DNA molecules maintained in a heterologous host cell, 

partially or substantially purified nucleic acid molecules, and synthetic DNA or RNA molecules. 

Preferably, an "isolated" nucleic acid is free of sequences which naturally flank the nucleic acid 

30 (/.e., sequences located at the 5'- and 3 -termini of the nucleic acid) in the genomic DNA of the 

organism from which the nucleic acid is derived. For example, in various embodiments, the 

isolated SECP nucleic acid molecule can contain less than aprroximately 50 kb, 25 kb, 5 kb, 

4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb or 0.1 kb of nucleotide sequences which naturally flank the nucleic 

acid molecule in genomic DNA of the cell from which the nucleic acid is derived. Moreover, an 
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"isolated" nucleic acid molecule, such as a cDNA molecule, can be substantially free of other 
cellular material or culture medium when produced by recombinant techniques, or of chemical 
precursors or other chemicals when chemically synthesized. 

A nucleic acid molecule of the invention, e.g., a nucleic acid molecule having the 
5 nucleotide sequence of SEQ ID NO: 1 , 3, 5, 7, 9, 11 , 1 3, 1 5, 1 7, 40, 42, 44, 46, 48, 50, 52, 54 and 
56, or a complement of any of these nucleotide sequences, can be isolated using standard 
molecular biology techniques and the sequence information provided herein. Using all or a 
portion of the nucleic acid sequence of any of SEQ ID NO: 1 , 3, 5, 7, 9, 11, 13, 15, 1 7, 40, 42, 44, 
46, 48, 50, 52, 54 and 56 as a hybridization probe, SECP nucleic acid sequences can be isolated 
10 using standard hybridization and cloning techniques (e.g., as described in Sambrook et al., eds.. 
Molecular Cloning: A Laboratory Manual 2""* Ed., Cold Spring Harbor Laboratory Press, 
Cold Spring Harbor, NY, 1989; and Ausubel, et al., eds., CURRENT PROTOCOLS IN MOLECULAR 
Biology, John Wiley & Sons, New York, NY, 1993.) 

A nucleic acid of the invention can be amplified using cDNA, mRNA or alternatively, 
15 genomic DNA, as a template and appropriate oligonucleotide primers according to standard PCR 
amplification techniques. The nucleic acid so amplified can be cloned into an appropriate vector 
and characterized by DNA sequence analysis. Furthermore, oligonucleotides corresponding to 
SECP nucleotide sequences can be prepared by standard synthetic techniques, e.g., using an 
automated DNA synthesizer. 

20 As used herein, the term "oligonucleotide" refers to a series of linked nucleotide residues, 

which oligonucleotide has a sufficient number of nucleotide bases to be used in a PCR reaction. 
A short oligonucleotide sequence may be based on, or designed from, a genomic or cDNA 
sequence and is used to amplify, confirm, or reveal the presence of an identical, similar or 
complementary DNA or RNA in a particular cell or tissue. Oligonucleotides comprise portions 

25 of a nucleic acid sequence having about 10 nt, 50 nt, or 100 nt in length, preferably about 15 nt 
to 30 nt in length. In one embodiment, an oligonucleotide comprising a nucleic acid molecule 
less than 1 00 nt in length would further comprise at lease 6 contiguous nucleotides of any of 
SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, or a complement 
thereof. Oligonucleotides may be chemically synthesized and may also be used as probes. 

30 In another embodiment, an isolated nucleic acid molecule of the invention comprises a 

nucleic acid molecule that is a complement of the nucleotide sequence shown in any of SEQ ID 
NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56. In still another 
embodiment, an isolated nucleic acid molecule of the invention comprises a nucleic acid 
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molecule that is a complement of the nucleotide sequence shown in any of SEQ ID NO:l, 3, 5, 7, 
9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, or a portion of this nucleotide sequence. 
A nucleic acid molecule that is complementary to the nucleotide sequence shown in is one that 
is sufficiently complementary to the nucleotide sequence shown in of any of SEQ ID NO: 1,3,5, 
5 7, 9, 1 1, 13, 1 5, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56 that it can hydrogen bond with little or 
no mismatches to the nucleotide sequence shown in of any of SEQ ID NO: 1, 3, 5, 7, 9, 11, 13, 
15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, thereby forming a stable duplex. 

As used herein, the term "complementary" refers to Watson-Crick or Hoogsteen base- 
pairing between nucleotides units of a nucleic acid molecule, whereas the term "binding" is 

10 defined as the physical or chemical interaction between two polypeptides or compounds or 
associated polypeptides or compounds or combinations thereof. Binding includes ionic, 
non-ionic, Von der Waals, hydrophobic interactions, and the like. A physical interaction can be 
either direct or indirect. Indirect interactions may be through or due to the effects of another 
polypeptide or compound. Direct binding refers to interactions that do not take place through, or 

1 5 due to, the effect of another polypeptide or compound, but instead are without other substantial 
chemical intermediates. 

Additionally, the nucleic acid molecule of the invention can comprise only a portion of 
the nucleic acid sequence of any of SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, XX, AA, CC, EE, 
GG, II, KK, MM, and OO, e.g., a fragment that can be used as a probe or primer, or a fragment 

20 encoding a biologically active portion of SECP. Fragments provided herein are defined as 

sequences of at least 6 (contiguous) nucleic acids or at least 4 (contiguous) amino acids, a length 
sufficient to allow for specific hybridization in the case of nucleic acids or for specific 
recognition of an epitope in the case of amino acids, respectively, and are at most some portion 
less than a fiill length sequence. Fragments may be derived from any contiguous portion of a 

25 nucleic acid or amino acid sequence of choice. Derivatives are nucleic acid sequences or amino 
acid sequences formed from the native compounds either directly or by modification or partial 
substitution. Analogs are nucleic acid sequences or amino acid sequences that have a structure 
similar to, but not identical to, the native compound but differs from it in respect to certain 
components or side chains. Analogs may be sjoithetic or from a different evolutionary origin and 

30 may have a similar or opposite metabolic activity compared to wild-type. 

Derivatives and analogs may be full-length or other than full-length, if the derivative or 
analog contains a modified nucleic acid or amino acid, as described below. Derivatives or 
analogs of the nucleic acids or proteins of the invention include, but are not limited to, molecules 
comprising regions that are substantially homologous to the nucleic acids or proteins of the 



invention, in various embodiments, by at least about 70%, 80%, 85%, 90%, 95%, 98%, or even 
99% identity (with a preferred identity of 80-99%) over a nucleic acid or amino acid sequence of 
identical size or when compared to an aligned sequence in which the alignment is done by a 
computer homology program known in the art, or whose encoding nucleic acid is capable of 
5 hybridizing to the complement of a sequence encoding the aforementioned proteins under 

stringent, moderately stringent, or low stringent conditions. See e,g. Ausubel, et al., CURRENT 
Protocols in Molecular Biology, John Wiley & Sons, New York, NY, 1993, and below. 
An exemplary program is the Gap program (Wisconsin Sequence Analysis Package, Version 8 
for UNIX, Genetics Computer Group, University Research Park, Madison, WI) using the default 
10 settings, which uses the algorithm of Smith and Waterman (Adv. Appl. Math., 1981, 2: 482- 
489), which is incorporated herein by reference in its entirety. 

The tern "homologous nucleic acid sequence" or "homologous amino acid sequence," or 
variations thereof, refer to sequences characterized by a homology at the nucleotide level or 
amino acid level as previously discussed. Homologous nucleotide sequences encode those 

1 5 sequences coding for isoforms of SECP polypeptide. Isoforms can be expressed in different 
tissues of the same organism as a result of, e.g., alternative splicing of RNA. Alternatively, 
isoforms can be encoded by different genes. In the invention, homologous nucleotide sequences 
include nucleotide sequences encoding for a SECP polypeptide of species other than humans, 
including, but not limited to, mammals, and thus can include, e.g., mouse, rat, rabbit, dog, cat 

20 cow, horse, and other organisms. Homologous nucleotide sequences also include, but are not 
limited to, naturally occurring allelic variations and mutations of the nucleotide sequences set 
forth herein. A homologous nucleotide sequence does not, however, include the nucleotide 
sequence encoding human SECP protein. Homologous nucleic acid sequences include those 
nucleic acid sequences that encode conservative amino acid substitutions (see below) in any of 

25 SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, as well as a 
polypeptide having SECP activity. Biological activities of the SECP proteins are described 
below. A homologous amino acid sequence does not encode the amino acid sequence of a 
human SECP polypeptide. 

The nucleotide sequence determined from the cloning of the human SECP gene allows 
30 for the generation of probes and primers designed for use in identifying the cell types disclosed 
and/or cloning SECP homologues in other cell types, e.g., from other tissues, as well as SECP 
homologues from other mammals. The probe/primer typically comprises a substantially-purified 
oligonucleotide. The oligonucleotide typically comprises a region of nucleotide sequence that 
hybridizes under stringent conditions to at least about 12, 25, 50, 100, 150, 200, 250, 300, 350 or 
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400 or more consecutive sense strand nucleotide sequence of SEQ ID NO: 1 , 3, 5, 7, 9, 11, 13, 
15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56; or an anti-sense strand nucleotide sequence of SEQ 
ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, or of a naturally 
occurring mutant of SEQ ID NO: 1, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 
5 56. 

Probes based upon the human SECP nucleotide sequence can be used to detect transcripts 
or genomic sequences encoding the same or homologous proteins. In various embodiments, the 
probe further comprises a label group attached thereto, e.g., the label group can be a 
radioisotope, a fluorescent compound, an enzyme, or an enzyme co-factor. Such probes can be 
10 used as a part of a diagnostic test kit for identifying cells or tissue which mis-express a SECP 
protein, such as by measuring a level of a SECP-encoding nucleic acid in a sample of cells from 
a subject e.g., detecting SECP mRNA levels or determining whether a genomic SECP gene has 
been mutated or deleted. 

The term "a polypeptide having a biologically-active portion of SECP" refers to 
15 polypeptides exhibiting activity similar, but not necessarily identical to, an activity of a 

polypeptide of the invention, including mature forms, as measured in a particular biological 
assay, with or without dose dependency. A nucleic acid fragment encoding a "biologically- 
active portion of SECP" can be prepared by isolating a portion of SEQ ID NO: 1 , 3, 5, 7, 9, 11, 
13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56 that encodes a polypeptide having a SECP 
20 biological activity, expressing the encoded portion of SECP protein {e.g. , by recombinant 
expression in vitro)^ and assessing the activity of the encoded portion of SECP. 

SECP Variants 

The invention further encompasses nucleic acid molecules that differ from the disclosed 
SECP nucleotide sequences due to degeneracy of the genetic code. These nucleic acids therefore 
25 encode the same SECP protein as those encoded by the nucleotide sequence shown in SEQ ID 
NO:l,3, 5, 7, 9, 11, 13, 15, 1 7, 40, 42, 44, 46, 48, 50, 52, 54 and 56. In another embodiment, an 
isolated nucleic acid molecule of the invention has a nucleotide sequence encoding a protein 
having an amino acid sequence shown in any of SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 
44, 46, 48, 50, 52, 54 and 56. 

30 In addition to the human SECP nucleotide sequence shown in any of SEQ ID NO: 1,3,5, 

7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, it will be appreciated by those skilled in 
the art that DNA sequence polymorphisms that lead to changes in the amino acid sequences of 
SECP may exist within a population (e.g., the human population). Such genetic polymorphism 
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in the SECP gene may exist among individuals within a population due to natural allelic 
variation. As used herein, the terms "gene" and "recombinant gene" refer to nucleic acid 
molecules comprising an open reading frame encoding a SECP protein, preferably a mammalian 
SECP protein. Such natural allelic variations can typically result in 1-5% variance in the 
5 nucleotide sequence of the SECP gene. Any and all such nucleotide variations and resulting 
amino acid polymorphisms in SECP that arc the result of natural allelic variation and that do not 
alter the functional activity of SECP are intended to be within the scope of the invention. 

Additionally, nucleic acid molecules encoding SECP proteins from other species, and 
thus that have a nucleotide sequence that differs from the human sequence of any of SEQ ID 

10 NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56 are intended to be within the 
scope of the invention. Nucleic acid molecules corresponding to natural allelic variants and 
homologues of the SECP cDNAs of the invention can be isolated based on their homology to the 
human SECP nucleic acids disclosed herein using the human cDNAs, or a portion thereof, as a 
hybridization probe according to standard hybridization techniques under stringent hybridization 

15 conditions. 

In another embodiment, an isolated nucleic acid molecule of the invention is at least 6 
nucleotides in length and hybridizes under stringent conditions to the nucleic acid molecule 
comprising the nucleotide sequence of any of SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, .17, 40, 42, 44, 
46, 48, 50, 52, 54 and/or 56. In another embodiment, the nucleic acid is at least 10, 25, 50, 100, 
20 250, 500 or 750 nucleotides in length. In yet another embodiment, an isolated nucleic acid 

molecule of the invention hybridizes to the coding region. As used herein, the term "hybridizes 
under stringent conditions" is intended to describe conditions for hybridization and washing 
under which nucleotide sequences at least 60% homologous to each other typically remain 
hybridized to each other. 

25 Homologs (i.e., nucleic acids encoding SECP proteins derived from species other than 

human) or other related sequences (e.g., paralogs) can be obtained by low, moderate or high 
stringency hybridization with all or a portion of the particular human sequence as a probe using 
methods well known in the art for nucleic acid hybridization and cloning. 

As used herein, the phrase "stringent hybridization conditions" refers to conditions under 
30 which a probe, primer or oligonucleotide will hybridize to its target sequence, but to no other 
sequences. Stringent conditions are sequence-dependent and will be different in different 
circumstances. Longer sequences hybridize specifically at higher temperatures than shorter 
sequences. Generally, stringent conditions are selected to be about 5*'C lower than the thermal 
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melting point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the 
temperature (under defined ionic strength, pH and nucleic acid concentration) at which 50% of 
the probes complementary to the target sequence hybridize to the target sequence at equilibrium. 
Since the target sequences are generally present at excess, at Tm, 50% of the probes are occupied 
5 at equilibrium. Typically, stringent conditions will be those in which the salt concentration is 
less than about 1 .0 M sodium ion, typically about 0.01 to 1 .0 M sodium ion (or other salts) at pH 
7.0 to 8.3 and the temperature is at least about 30*C for short probes, primers or oligonucleotides 
{e,g,, 10 nt to 50 nt) and at least about 60°C for longer probes, primers and oligonucleotides. 
Stringent conditions may also be achieved with the addition of destabilizing agents, such as 
10 formamide. 

Stringent conditions are known to those skilled in the art and can be found in CURRENT 
Protocols in Molecular Biology, John Wiley & Sons, N.Y. (1989), 6.3.1-6.3.6. Preferably, 
the conditions are such that sequences at least about 65%, 70%, 75%, 85%, 90%, 95%, 98%, or 
99% homologous to each other typically remain hybridized to each other. A non-limiting 

15 example of stringent hybridization conditions is hybridization in a high salt buffer comprising 
6X SSC, 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.02% PVP, 0.02% Ficoll, 0.02% BSA, and 
500 mg/ml denatured salmon sperm DNA at 65X. This hybridization is followed by one or 
more washes in 0.2X SSC, 0.01% BSA at 50X. An isolated nucleic acid molecule of the 
invention that hybridizes under stringent conditions to the sequence of any of SEQ ID NO: 1, 3, 

20 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56 corresponds to a naturally occurring 
nucleic acid molecule. As used herein, a "naturally-occurring" nucleic acid molecule refers to an 
RNA or DNA molecule having a nucleotide sequence that occurs in nature (e,g.y encodes a 
natural protein). 

In a second embodiment, a nucleic acid sequence that is hybridizable to the nucleic acid 
25 molecule comprising the nucleotide sequence of any of SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 
40, 42, 44, 46, 48, 50, 52, 54 and/or 56, or fragments, analogs or derivatives thereof, under 
conditions of moderate stringency is provided. A non-limiting example of moderate stringency 
hybridization conditions are hybridization in 6X SSC, 5X Denhardt's solution, 0.5% SDS and 
100 mg/ml denatured salmon sperm DNA at 55*^0, followed by one or more washes in IX SSC, 
30 0.1% SDS at 37°C. Other conditions of moderate stringency that may be used are well known in 
the art. See, e.g., Ausubel et al (^eds.), 1993, Current Protocols in Molecular Biology, 
John Wiley & Sons, NY, and Kriegler, 1990. Gene Transfer and Expression, A Laboratory 
Manual, Stockton Press, NY. 
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In a third embodiment, a nucleic acid that is hybridizable to the nucleic acid molecule 
comprising the nucleotide sequence of any of SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 
46, 48, 50, 52, 54 and 56, or fragments, analogs or derivatives thereof, under conditions of low 
stringency, is provided. A non-limiting example of low stringency hybridization conditions are 
5 hybridization in 35% formamide, 5X SSC, 50 mM Tris-HCl (pH 7.5), 5 mM EDTA, 0.02% PVP, 
0.02% Ficoll, 0.2% BSA, 100 mg/ml denatured salmon sperm DNA, 10% (wt/vol) dextran 
sulfate at 40°C, followed by one or more washes in 2X SSC, 25 mM Tris-HCl (pH 7.4), 5 mM 
EDTA, and 0.1% SDS at 50°C. Other conditions of low stringency that may be used are well 
known in the art (e.g., as employed for cross-species hybridizations). See, e.g., Ausubel, et aL, 
10 feds.), 1993. Current Protocols in Molecular Biology, John Wiley & Sons, NY, and 
Kriegler, 1990. GENE TRANSFER AND Expression, A Laboratory Manual, Stockton Press, 
NY; Shilo and Weinberg, 1981. Proc. Natl. Acad. Sci. USA 78: 6789-6792. 

Conservative Mutations 

In addition to naturally-occurring allelic variants of the SECP sequence that may exist in 
1 5 the population, the skilled artisan will further appreciate that changes can be introduced by 

mutation into the nucleotide sequence of any of SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 
44, 46, 48, 50, 52, 54 and 56, thereby leading to changes in the amino acid sequence of the 
encoded SECP protein, without altering the functional ability of the SECP protein. For example, 
nucleotide substitutions leading to amino acid substitutions at "non-essential" amino acid 
20 residues can be made in the sequence of any of SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 
44, 46, 48, 50, 52, 54 and 56. A "non-essential" amino acid residue is a residue that can be 
altered from the wild-type sequence of SECP without altering the biological activity, whereas an 
"essential" amino acid residue is required for biological activity. For example, amino acid 
residues that are conserved among the SECP proteins of the invention, are predicted to be 
25 particularly non-amenable to such alteration. 

Amino acid residues that are conserved among members of a SECP family members are 
predicted to be less amenable to alteration. For example, a SECP protein according to the 
invention can contain at least one domain that is a typically conserved region in a SECP family 
member. As such, these conserved domains are not likely to be amenable to mutation. Other 
30 amino acid residues, however, {e.g., those that are not conserved or only semi-conserved among 
members of the SECP family) may not be as essential for activity and thus are more likely to be 
amenable to alteration. 
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Another aspect of the invention pertains to nucleic acid molecules encoding SECP 
proteins that contain changes in amino acid residues that are not essential for activity. Such 
SECP proteins differ in amino acid sequence from any of any of SEQ ID NO:2, 4, 6, 8, 10, 12, 
14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 57, yet retain biological activity. In one 
5 embodiment, the isolated nucleic acid molecule comprises a nucleotide sequence encoding a 
protein, wherein the protein comprises an amino acid sequence at least about 75% homologous 
to the amino acid sequence of any of SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 
51 , 53, 55 and 57. Preferably, the protein encoded by the nucleic acid is at least about 80% 
homologous to any of SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 
10 57, more preferably at least about 90%, 95%, 98%, and most preferably at least about 99% 

homologous to SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 57,. 

An isolated nucleic acid molecule encoding a SECP protein homologous to the protein of 
any of SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 57, can be 
created by introducing one or more nucleotide substitutions, additions or deletions into the 
15 corresponding nucleotide sequence (/.e., SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 
48, 50, 52, 54 and/or 56), such that one or more amino acid substitutions, additions or deletions 
are introduced into the encoded protein. 

Mutations can be introduced into SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 
48, 50, 52, 54 and/or 56 by standard techniques, such as site-directed mutagenesis and 

20 PCR-mediated mutagenesis. Preferably, conservative amino acid substitutions are made at one 
or more predicted non-essential amino acid residues. A "conservative amino acid substitution" is 
one in which the amino acid residue is replaced with an amino acid residue having a similar side 
chain. Families of amino acid residues having similar side chains have been defined in the art. 
These families include amino acids with basic side chains (e.g., lysine, arginine, histidine), acidic 

25 side chains (e.g., aspartic acid, glutamic acid), uncharged polar side chains (e.g.^ glycine, 

asparagine, glutamine, serine, threonine, tyrosine, cysteine), nonpolar side chains (e.g,^ alanine, 
valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan), P-branched side 
chains (e.g., threonine, valine, isoleucine) and aromatic side chains (e.g., tyrosine, phenylalanine, 
tryptophan, histidine). Thus, a predicted nonessential amino acid residue in SECP is replaced 

30 with another amino acid residue from the same side chain family. Alternatively, in another 
embodiment, mutations can be introduced randomly along all or part of a SECP coding 
sequence, such as by saturation mutagenesis, and the resultant mutants can be screened for SECP 
biological activity to identify mutants that retain activity. Following mutagenesis of SEQ ID 
NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56, the encoded protein can 



be expressed by any recombinant technology known in the art and the activity of the protein can 
be determined. 

In one embodiment, a mutant SECP protein can be assayed for: (/) the ability to form 
protein: protein interactions with other SECP proteins, other cell-surface proteins, or biologically- 
5 active portions thereof; (ii) complex formation between a mutant SECP protein and a SECP 
receptor; (Hi) the ability of a mutant SECP protein to bind to an intracellular target protein or 
biologically active portion thereof; (e.g., avidin proteins); (/v) the ability to bind BRA protein; or 
(v) the ability to specifically bind an anti-SECP protein antibody. 

Antisense Nucleic Acids 

10 Another aspect of the invention pertains to isolated antisense nucleic acid molecules that 

are hybridizable to or complementary to the nucleic acid molecule comprising the nucleotide 
sequence of SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56 or 
fragments, analogs or derivatives thereof. An "antisense" nucleic acid comprises a nucleotide 
sequence that is complementary to a "sense" nucleic acid encoding a protein, e.g., 

1 5 complementary to the coding strand of a double-stranded cDNA molecule or complementary to 
an mRNA sequence. In specific aspects, antisense nucleic acid molecules are provided that 
comprise a sequence complementary to at least about 10, 25, 50, 100, 250 or 500 nucleotides or 
an entire SECP coding strand, or to only a portion thereof. Nucleic acid molecules encoding 
fragments, homologs, derivatives and analogs of a SECP protein of any of SEQ ID NO:2, 4, 6, 8, 

20 10, 12, 14, 16, 18. YY, BB, DD, FF, HH, JJ, LL, NN, and PP. 

or antisense nucleic acids complementary to a SECP nucleic acid sequence of SEQ ID 
NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 57, are additionally provided. 

In one embodiment, an antisense nucleic acid molecule is antisense to a "coding region" 
of the coding strand of a nucleotide sequence encoding SECP. The term "coding region" refers 
25 to the region of the nucleotide sequence comprising codons which are translated into amino acid 
residues {e.g., the protein coding region of a human SECP that corresponds to any of SEQ ID 
NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 57. 

. In another embodiment, the antisense nucleic acid molecule is antisense to a "non- 
coding region" of the coding strand of a nucleotide sequence encoding SECP. The term "non- 
30 coding region" refers to 5'- and 3'-tenninal sequences which flank the coding region that are not 
translated into amino acids (i.e., also referred to as 5* and 3' non-translated regions). 
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Given the coding strand sequences encoding the SECP proteins disclosed herein (e.^., 
SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56), antisense 
nucleic acids of the invention can be designed according to the rules of Watson and Crick or 
Hoogsteen base-pairing. The antisense nucleic acid molecule can be complementary to the entire 
5 coding region of SECP mRNA, but more preferably is an oligonucleotide that is antisense to 
only a portion of the coding or non-coding region of SECP mRNA. For example, the antisense 
oligonucleotide can be complementary to the region surrounding the translation start site of 
SECP mRNA. An antisense oligonucleotide can be, for example, about 5, 10, 15, 20, 25, 30, 35, 
40, 45 or 50 nucleotides in length. An antisense nucleic acid of the invention can be constructed 

10 using chemical synthesis or enzymatic ligation reactions using procedures known in the art. For 
example, an antisense nucleic acid (e.^., an antisense oligonucleotide) can be chemically 
synthesized using naturally-occurring nucleotides or variously modified nucleotides designed to 
increase the biological stability of the molecules or to increase the physical stability of the 
duplex formed between the antisense and sense nucleic acids, e.g., phosphorothioate derivatives 

1 5 and acridine-substituted nucleotides can be used. 

Examples of modified nucleotides that can be used to generate the antisense nucleic acid 
include: 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, xanthine, 
4-acetylcytosine, 5-(carboxyhydroxylmethyl) uracil, 5-carboxymethylaminomethyl- 
2-thiouridine, 5-carboxymethylaminomethyluracil, dihydrouracil, beta-D-galactosylqueosine, 
20 inosine, N6-isopentenyladenine, 1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 
2-methyladenine, 2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-adenine, 
7-methylguanine, 5-methylaminomethyluracil, 5-methoxyaminomethyl-2-thiouracil, 
beta-D-mannosylqueosine, 5 -methoxycarboxymethyluracil, 5-methoxyuracil, 

2- methylthio-N6-isopentenyladenine, uracil-5-oxyacetic acid (v), wybutoxosine, pseudouracil, 
25 queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil, 

uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid (v), 5-methyl-2-thiouracil, 

3- (3-amino-3-N-2-carboxypropyl) uracil, (acp3)w, and 2,6-diaminopurine. Alternatively, the 
antisense nucleic acid can be produced biologically using an expression vector into which a 
nucleic acid has been subcloned in an antisense orientation (/.e., RNA transcribed from the 

30 inserted nucleic acid will be of an antisense orientation to a target nucleic acid of interest, 
described further in the following subsection). 

The antisense nucleic acid molecules of the invention are typically administered to a 
subject or generated in situ such that they hybridize with or bind to cellular mRNA and/or 
genomic DNA encoding a SECP protein to thereby inhibit expression of the protein, e.g., by 
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inhibiting transcription and/or translation. The hybridization can be by conventional nucleotide 
complementarity to form a stable duplex, or, for example, in the case of an antisense nucleic acid 
molecule that binds to DNA duplexes, through specific interactions in the major groove of the 
double helix. An example of a route of administration of antisense nucleic acid molecules of the 
S invention includes direct injection at a tissue site. Alternatively, antisense nucleic acid 
molecules can be modified to target selected cells and then administered systemically. For 
example, for systemic administration, antisense molecules can be modified such that they 
specifically bind to receptors or antigens expressed on a selected cell surface (e.g., by linking the 
antisense nucleic acid molecules to peptides or antibodies that bind to cell surface receptors or 
10 antigens). The antisense nucleic acid molecules can also be delivered to cells using the vectors 
described herein. To achieve sufficient intracellular concentrations of antisense molecules, 
vector constructs in which the antisense nucleic acid molecule is placed under the control of a 
strong pol II or pol III promoter are preferred. 

In yet another embodiment, the antisense nucleic acid molecule of the invention is an 
15 a-anomeric nucleic acid molecule. An a-anomeric nucleic acid molecule forms specific 

double-stranded hybrids with complementary RNA in which, contrary to the usual a-units, the 
strands run parallel to each other (see, Gaultier, et al., 1987. NucL Acids Res, 15: 6625-6641). 
The antisense nucleic acid molecule can also comprise a 2'-o-methylribonucleotide (Inoue, et ai.^ 
1987. NucL Acids Res. 15: 6131-6148) or a chimeric RNA-DNA analogue (Inoue, et aL, 1987. 
20 FEES Lett 215: 327-330). 

Ribozymes and PNA Moieties 

Such modifications include, by way of non-limiting example, modified bases, and nucleic 
acids whose sugar phosphate backbones are modified or derivatized. These modifications are 
carried out at least in part to enhance the chemical stability of the modified nucleic acid, such 
25 that they may be used, for example, as antisense binding nucleic acids in therapeutic applications 
in a subject. 

In still another embodiment, an antisense nucleic acid of the invention is a ribozyme. 
Ribozymes are catalytic RNA molecules with ribonuclease activity that are capable of cleaving a 
single-stranded nucleic acid, such as an mRNA, to which they have a complementary region. 
30 Thus, ribozymes {e.g.^ hammerhead ribozymes; described by Haselhoff and Gerlach, 1988. 
Nature 334: 585-591) can be used to catalytically-cleave SECP mRNA transcripts to thereby 
inhibit translation of SECP mRNA. A ribozyme having specificity for a SECP-encoding nucleic 
acid can be designed based upon the nucleotide sequence of a SECP DNA disclosed herein (/.e., 
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SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56). For example, a 
derivative of a Tetrahymena L-19 IVS RNA can be constructed in which the nucleotide sequence 
of the active site is complementary to the nucleotide sequence to be cleaved in a SECP-encoding 
mRNA. See, e.g., Cech, et a/., U.S. Patent No. 4,987,071 ; and Ccch, et aL, U.S. Patent No. 
5 5,1 16,742. Alternatively, SECP mRNA can be used to select a catalytic RNA having a specific 
ribonucleasc activity from a pool of RNA molecules (Bartel, et al.^ 1993. Science 261: 
1411-1418). 

Alternatively, SECP gene expression can be inhibited by targeting nucleotide sequences 
complementary to the regulatory region of the SECP (e.g.y the SECP promoter and/or enhancers) 
10 to form triple helical structures that prevent transcription of the SECP gene in target cells. See, 
e,g., Helene, 1991. Anticancer Drug Des, 6: 569-84; Helene, et al, 1992. Ann. KY. Acad. Sci. 
660: 27-36; and Maher, 1992, Bioassays 14: 807-15. 

In various embodiments, the nucleic acids of SECP can be modified at the base moiety, 
sugar moiety or phosphate backbone to improve, e.g,, the stability, hybridization, or solubility of 

1 5 the molecule. For example, the deoxyribose phosphate backbone of the nucleic acids can be 

modified to generate peptide nucleic acids (Hyrup, et al., 1996. Bioorg, Med. Chem. 4: 5-23). As 
used herein, the terms "peptide nucleic acids" or "PNAs" refer to nucleic acid mimics, e.g., DNA 
mimics, in which the deoxyribose phosphate backbone is replaced by a pseudopeptide backbone 
and only the four natural nucleobases are retained. The neutral backbone of PNAs has been 

20 shown to allow for specific hybridization to DNA and RNA under conditions of low ionic 

strength. The synthesis of PNA oligomers can be performed using standard solid phase peptide 
synthesis protocols as described in Hyrup, et al., 1996. supra\ Perry-O'Keefe, et al.^ 1996. Proc. 
Natl. Acad. Sci. USA 93: 14670-14675. 

PNAs of SECP can be used in therapeutic and diagnostic applications. For example, 
25 PNAs can be used as antisense or antigene agents for sequence-specific modulation of gene 

expression by, e.g., inducing transcription or translation arrest or inhibiting replication. PNAs of 
SECP can also be used, e.g., in the analysis of single base pair mutations in a gene by, e.g., PNA 
directed PCR clamping; as artificial restriction enzymes when used in combination with other 
enzymes, e.g., SI nucleases {see, Hyrup, 1996., supra)', or as probes or primers for DNA 
30 sequence and hybridization {see, Hyrup, et al.^ 1996.; Perry-O'Keefe, 1996., supra). 

In another embodiment, PNAs of SECP can be modified, e.^., to enhance their stability 
or cellular uptake, by attaching lipophilic or other helper groups to PNA, by the formation of 
PNA-DNA chimeras, or by the use of liposomes or other techniques of drug delivery known in 
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the art. For example, PNA-DNA chimeras of SECP can be generated that may combine the 
advantageous properties of PNA and DNA. Such chimeras allow DNA recognition enzymes, 
e,g,y RNase H and DNA polymerases, to interact with the DNA portion while the PNA portion 
would provide high binding affinity and specificity. PNA-DNA chimeras can be linked using 
5 linkers of appropriate lengths selected in terms of base stacking, number of bonds between the 
nucleobases, and orientation {see, Hyrup, 1996., supra). The synthesis of PNA-DNA chimeras 
can be performed as described in Finn, et al.y (1996. Nucl. Acids Res. 24: 3357-3363). For 
example, a DNA chain can be synthesized on a solid support using standard phosphoramidite 
coupling chemistry, and modified nucleoside analogs, e.g., 5'-(4-methoxytrityl)amino- 

10 5'-deoxy-thymidine phosphoramidite, can be used between the PNA and the 5* end of DNA 
(Mag, et al., 1989. NucL Acid Res. 17: 5973-5988). PNA monomers are then coupled in a 
stepwise manner to produce a chimeric molecule with a 5' PNA segment and a 3* DNA segment 
{see, Finn, et al.^ 1996., supra). Alternatively, chimeric molecules can be synthesized with a 5* 
DNA segment and a 3' PNA segment. See, e.g., Petersen, et al.y 1975. Bioorg. Med. Chem. Lett. 

15 5:1119-11124. 

In other embodiments, the oligonucleotide may include other appended groups such as 
peptides (e.g., for targeting host cell receptors in v/vo), or agents facilitating transport across the 
cell membrane {see, e.g.,, Letsinger, et al., 1989. Proc. Natl. Acad. Sci. U.S.A. 86: 6553-6556; 
Lemaitre, et aL, 1987. Proc. Natl. Acad. Sci. 84: 648-652; PCX Publication No. WO88/09810) or 

20 the blood-brain barrier {see, e.g., PCX Publication No. WO 89/10134). In addition, 

oligonucleotides can be modified with hybridization triggered cleavage agents {see, e.g., Krol, et 
al., 1988. BioTechniques 6:958-976) or intercalating agents {see, e.g., Zon, 1988. Pharm. Res. 5: 
539-549). Xo this end, the oligonucleotide may be conjugated to another molecule, e.g., a 
peptide, a hybridization triggered cross-linking agent, a transport agent, a hybridization-triggered 

25 cleavage agent, and the like. 

Characterization of SECP Polypeptides 

A polypeptide according to the invention includes a polypeptide including the amino acid 
sequence of SECP polypeptides whose sequences are provided in SEQ ID NO:2, 4, 6, 8, 10, 12, 
14, 16, 1 8, YY, BB, DD, FF, HH, JJ, LL, NN and/or PP. Xhe invention also includes a mutant or 
30 variant protein any of whose residues may be changed from the corresponding residues shown in 
SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, YY, BB, DD, FF, HH, JJ, LL, NN and/or PP while still 
encoding a protein that maintains its SECP activities and physiological fiinctions, or a functional 
firagment thereof. 
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In general, a SECP variant that preserves SECP-like function includes any variant in 
which residues at a particular position in the sequence have been substituted by other amino 
acids, and further include the possibility of inserting an additional residue or residues between 
two residues of the parent protein as well as the possibility of deleting one or more residues from 
5 the parent sequence. Any amino acid substitution, insertion, or deletion is encompassed by the 
invention. In favorable circumstances, the substitution is a conservative substitution as defined 
above. 

One aspect of the invention pertains to isolated SECP proteins, and biologically-active 
portions thereof, or derivatives, fragments, analogs or homologs thereof. Also provided are 

10 polj^eptide fragments suitable for use as immunogens to raise anti-SECP antibodies. In one 

embodiment, native SECP proteins can be isolated from cells or tissue sources by an appropriate 
purification scheme using standard protein purification techniques. In another embodiment, 
SECP proteins are produced by recombinant DNA techniques. Alternative to recombinant 
expression, a SECP protein or polypeptide can be synthesized chemically using standard peptide 

1 5 synthesis techniques. 

An "isolated" or "purified" polypeptide or protein or biologically-active portion thereof is 
substantially free of cellular material or other contaminating proteins from the cell or tissue 
source from which the SECP protein is derived, or substantially free from chemical precursors or 
other chemicals when chemically synthesized. The language "substantially free of cellular 

20 material" includes preparations of SECP proteins in which the protein is separated from cellular 
components of the cells from which it is isolated or recombinantly-produced. In one 
embodiment, the language "substantially free of cellular material" includes preparations of SECP 
proteins having less than about 30% (by dry weight) of non-SECP proteins (also referred to 
herein as a "contaminating protein"), more preferably less than about 20% of non-SECP proteins, 

25 still more preferably less than about 10% of non-SECP proteins, and most preferably less than 
about 5% of non-SECP proteins. When the SECP protein or biologically-active portion thereof 
is recombinantly-produced, it is also preferably substantially free of culture medium, i.e., culture 
medium represents less than about 20%, more preferably less than about 10%, and most 
preferably less than about 5% of the volume of the SECP protein preparation. 

30 The phrase "substantially free of chemical precursors or other chemicals" includes 

preparations of SECP protein in which the protein is separated from chemical precursors or other 

chemicals that are involved in the synthesis of the protein. In one embodiment, the language 

"substantially free of chemical precursors or other chemicals" includes preparations of SECP 

protein having less than about 30% (by dry weight) of chemical precursors or non-SECP 
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chemicals, more preferably less than about 20% chemical precursors or non-SECP chemicals, 
still more preferably less than about 10% chemical precursors or non-SECP chemicals, and most 
preferably less than about 5% chemical precursors or non-SECP chemicals. 

Biologically-active portions of a SECP protein include peptides comprising amino acid 
5 sequences sufficiently homologous to or derived from the amino acid sequence of the SECP 
protein which include fewer amino acids than the full-length SECP proteins, and exhibit at least 
one activity of a SECP protein. Typically, biologically-active portions comprise a domain or 
motif with at least one activity of the SECP protein. A biologically-active portion of a SECP 
protein can be a polypeptide which is, for example, 10, 25, 50, 100 or more amino acids in 
10 length. 

A biologically-active portion of a SECP protein of the invention may contain at least one 
of the above-identified conserved domains. Moreover, other biologically active portions, in 
which other regions of the protein are deleted, can be prepared by recombinant techniques and 
evaluated for one or more of the functional activities of a native SECP protein. 

15 In an embodiment, the SECP protein has an amino acid sequence shown in any of SEQ 

ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56. In other 

embodiments, the SECP protein is substantially homologous to any of SEQ ID NO:l, 3, 5, 7, 9, 
11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56 and retains the functional activity of the 
protein of any of SEQ ID NO: 1 , 3, 5, 7, 9, 1 1 , 1 3, 1 5, 1 7, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56 

20 yet differs in amino acid sequence due to natural allelic variation or mutagenesis, as described in 
detail below. Accordingly, in another embodiment, the SECP protein is a protein that comprises 
an amino acid sequence at least about 45% homologous, and more preferably about 55, 65, 70, 
75, 80, 85, 90, 95, 98 or even 99% homologous to the amino acid sequence of any of SEQ ID 
NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56 and retains the functional 

25 activity of the SECP proteins of the corresponding polypeptide having the sequence of SEQ ID 
NO: 1 , 3, 5, 7, 9, 1 1 , 1 3, 1 5, 1 7, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56. 

Determining Homology Between Two or More Sequences 

To determine the percent homology of two amino acid sequences or of two nucleic acids, 
the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in the 
30 sequence of a first amino acid or nucleic acid sequence for optimal alignment with a second 

amino or nucleic acid sequence). The amino acid residues or nucleotides at corresponding amino 
acid positions or nucleotide positions are then compared. When a position in the first sequence 
is occupied by the same amino acid residue or nucleotide as the corresponding position in the 
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second sequence, then the molecules are homologous at that position (i.e., as used herein amino 
acid or nucleic acid "homology" is equivalent to amino acid or nucleic acid "identity"). 

The nucleic acid sequence homology may be determined as the degree of identity 
between two sequences. The homology may be determined using computer programs known in 
5 the art, such as GAP software provided in the GCG program package. See, Needleman and 

Wunsch, 1970. J, MoL Biol 48: 443-453. Using GCG GAP software with the following settings 
for nucleic acid sequence comparison: GAP creation penalty of 5.0 and GAP extension penalty 
of 0.3, the coding region of the analogous nucleic acid sequences referred to above exhibits a 
degree of identity preferably of at least 70%, 75%, 80%, 85%, 90%, 95%, 98%, or 99%, with the 
10 CDS (encoding) part of the DNA sequence shown in SEQ ID NO: 1 , 3, 5, 7, 9, 1 1 , 1 3, 1 5, 1 7, 40, 
42, 44, 46, 48, 50, 52, 54 and/or 56. 

The term "sequence identity" refers to the degree to which two polynucleotide or 
polypeptide sequences are identical on a residue-by-residue basis over a particular region of 
comparison. The term "percentage of sequence identity" is calculated by comparing two 

15 optimally aligned sequences over that region of comparison, determining the number of positions 
at which the identical nucleic acid base {e.g., A, T, C, G, U, or I, in the case of nucleic acids) 
occurs in both sequences to yield the number of matched positions, dividing the number of 
matched positions by the total number of positions in the region of comparison {i.e., the window 
size), and multiplying the result by 100 to yield the percentage of sequence identity. The term 

20 "substantial identity" as used herein denotes a characteristic of a polynucleotide sequence, 

wherein the polynucleotide comprises a sequence that has at least 80 percent sequence identity, 
preferably at least 85 percent identity and often 90 to 95 percent sequence identity, more usually 
at least 99 percent sequence identity as compared to a reference sequence over a comparison 
region. 

25 Chimeric and Fusion Proteins 

The invention also provides SECP chimeric or fusion proteins. As used herein, a SECP 
"chimeric protein" or "fusion protein" comprises a SECP polypeptide operatively-linked to a 
non-SECP polypeptide. An "SECP polypeptide" refers to a polypeptide having an amino acid 
sequence corresponding to a SECP protein shown in SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 

30 YY, BB, DD, FF, HH, JJ, LL, NN and/or PP, whereas a "non-SECP polypeptide" refers to a 
polypeptide having an amino acid sequence corresponding to a protein that is not substantially 
homologous to the SECP protein {e.g., a protein that is different from the SECP protein and that 
is derived from the same or a different organism). Within a SECP fusion protein the SECP 
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polypeptide can correspond to all or a portion of a SECP protein. In one embodiment, a SECP 
fusion protein comprises at least one biologically-active portion of a SECP protein. In another 
embodiment, a SECP fusion protein comprises at least two biologically-active portions of a 
SECP protein. In yet another embodiment, a SECP fusion protein comprises at least three 
5 biologically-active portions of a SECP protein. Within the fusion protein, the term "operatively- 
linkcd" is intended to indicate that the SECP polypeptide and the non-SECP polypeptide are 
fused in-frame with one another. The non-SECP polypeptide can be fused to the amino-terminus 
or carboxyl-terminus of the SECP polypeptide. 

In one embodiment, the fusion protein is a GST-SECP fusion protein in which the SECP 
10 sequences are fused to the carboxyl-terminus of the GST (glutathione S-transferase) sequences. 
Such fusion proteins can facilitate the purification of recombinant SECP polypeptides. 

In another embodiment, the fusion protein is a SECP protein containing a heterologous 
signal sequence at its amino-terminus. In certain host cells (e,g., mammalian host cells), 
expression and/or secretion of SECP can be increased through use of a heterologous signal 
1 5 sequence. 

In yet another embodiment, the fusion protein is a SECP-immunoglobulin fusion protein 
in which the SECP sequences are fused to sequences derived from a member of the 
immunoglobulin protein family. The SECP-immunoglobulin fusion proteins of the invention can 
be incorporated into pharmaceutical compositions and administered to a subject to inhibit an 

20 interaction between a SECP ligand and a SECP protein on the surface of a cell, to thereby 
suppress SECP-mediated signal transduction in vivo. The SECP-immunoglobulin fusion 
proteins can be used to affect the bioavailability of a SECP cognate ligand. Inhibition of the 
SECP ligand/SECP interaction may be useful therapeutically for both the treatment of 
proliferative and differentiative disorders, as well as modulating (e.g., promoting or inhibiting) 

25 cell survival. Moreover, the SECP-immunoglobulin fusion proteins of the invention can be used 
as immunogens to produce anti-SECP antibodies in a subject, to purify SECP ligands, and in 
screening assays to identify molecules that inhibit the interaction of SECP with a SECP ligand. 

A SECP chimeric or fusion protein of the invention can be produced by standard 
recombinant DNA techniques. For example, DNA fragments coding for the different 
30 polypeptide sequences are ligated together in-frame in accordance with conventional techniques, 
e.g.^ by employing blunt-ended or stagger-ended termini for ligation, restriction enzyme 
digestion to provide for appropriate termini, filling-in of cohesive ends as appropriate, alkaline 
phosphatase treatment to avoid undesirable joining, and enzymatic ligation. In another 
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embodiment, the fusion gene can be synthesized by conventional techniques including automated 
DNA synthesizers. Alternatively, PGR amplification of gene fragments can be carried out using 
anchor primers that give rise to complementary overhangs between two consecutive gene 
fragments that can subsequently be annealed and reamplified to generate a chimeric gene 
5 sequence {see, e.g., Ausubel, et al. (eds.) CURRENT PROTOCOLS IN Molecular Biology, John 
Wiley & Sons, 1992). Moreover, many expression vectors are commercially available that 
already encode a fusion moiety {e.g,, a GST polypeptide). A SECP-encoding nucleic acid can be 
cloned into such an expression vector such that the fusion moiety is linked in-frame to the SECP 
protein. 

1 0 SECP Agonists and Antagonists 

The invention also pertains to variants of the SECP proteins that function as either SECP 
agonists {i.e., mimetics) or as SECP antagonists. Variants of the SECP protein can be generated 
by mutagenesis {e.g., discrete point mutation or truncation of the SECP protein). An agonist of a 
SECP protein can retain substantially the same, or a subset of, the biological activities of the 

15 naturally-occurring form of a SECP protein. An antagonist of a SECP protein can inhibit one or 
more of the activities of the naturally occurring form of a SECP protein by, for example, 
competitively binding to a downstream or upstream member of a cellular signaling cascade 
which includes the SECP protein. Thus, specific biological effects can be elicited by treatment 
with a variant of limited function. In one embodiment, treatment of a subject with a variant 

20 having a subset of the biological activities of the naturally occurring form of the protein has 

fewer side effects in a subject relative to treatment with the naturally occurring form of the SECP 
proteins. 

Variants of the SECP proteins that function as either SECP agonists {i.e., mimetics) or as 
SECP antagonists can be identified by screening combinatorial libraries of mutants {e.g., 

25 truncation mutants) of the SECP proteins for SECP protein agonist or antagonist activity. In one 
embodiment, a variegated library of SECP variants is generated by combinatorial mutagenesis at 
the nucleic acid level and is encoded by a variegated gene library. A variegated library of SECP 
variants can be produced by, for example, enzymatically-ligating a mixture of synthetic 
oligonucleotides into gene sequences such that a degenerate set of potential SECP sequences is 

30 expressible as individual polypeptides, or alternatively, as a set of larger fusion proteins {e.g., for 

phage display) containing the set of SECP sequences therein. There are a variety of methods 

which can be used to produce libraries of potential SECP variants firom a degenerate 

oligonucleotide sequence. Chemical synthesis of a degenerate gene sequence can be performed 

in an automatic DNA synthesizer, and the synthetic gene then ligated into an appropriate 
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expression vector. Use of a degenerate set of genes allows for the provision, in one mixture, of 
all of the sequences encoding the desired set of potential SECP sequences. Methods for 
synthesizing degenerate oligonucleotides are well-known within the art. See, e,g,, Narang, 1983. 
Tetrahedron 39: 3; Itakura, et al., 1984. Annu. Rev. Biochem. 53: 323; Itakura, et aL, 1984. 
5 Science 198: 1056; Ike, et aL, 1983. Nucl. Acids Res. 11: 477. 

Polypeptide Libraries 

In addition, libraries of fragments of the SECP protein coding sequences can be used to 
generate a variegated population of SECP fragments for screening and subsequent selection of 
variants of a SECP protein. In one embodiment, a library of coding sequence fragments can be 

10 generated by treating a double-stranded PCR fragment of a SECP coding sequence with a 

nuclease under conditions wherein nicking occurs only about once per molecule, denaturing the 
double stranded DNA, renaturing the DNA to form double-stranded DNA that can include 
sense/antisense pairs from different nicked products, removing single stranded portions from 
reformed duplexes by treatment with S| nuclease, and ligating the resulting fragment library into 

15 an expression vector By this method, expression libraries can be derived which encodes 
amino-terminal and internal fragments of various sizes of the SECP proteins. 

Various techniques are known in the art for screening gene products of combinatorial 
libraries made by point mutations or truncation, and for screening cDNA libraries for gene 
products having a selected property. Such techniques are adaptable for rapid screening of the 

20 gene libraries generated by the combinatorial mutagenesis of SECP proteins. The most widely 
used techniques, which are amenable to high throughput analysis, for screening large gene 
libraries typically include cloning the gene library into replicable expression vectors, 
transforming appropriate cells with the resulting library of vectors, and expressing the 
combinatorial genes under conditions in which detection of a desired activity facilitates isolation 

25 of the vector encoding the gene whose product was detected. Recursive ensemble mutagenesis 
(REM), a new technique that enhances the frequency of functional mutants in the libraries, can 
be used in combination with the screening assays to identify SECP variants. See, e.g., Arkin and 
Yourvan, 1992. Proc. Natl Acad. Sci, USA 89: 781 1-7815; Delgrave, et aL, 1993. Protein 
Engineering 6:327-33 1 . 

30 Anti-SECP Antibodies 

The invention encompasses antibodies and antibody fragments, such as Fab or (Fab)2, that 
bind immunospecifically to any of the SECP polypeptides of said invention. 
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An isolated SECP protein, or a portion or fragment thereof, can be used as an immunogen 
to generate antibodies that bind to SECP polypeptides using standard techniques for polyclonal 
and monoclonal antibody preparation. The full-length SECP proteins can be used or, 
alternatively, the invention provides antigenic peptide fragments of SECP proteins for use as 
5 immunogens. The antigenic SECP peptides comprises at least 4 amino acid residues of the 

amino acid sequence shown in SEQ ID NO: 2, 4, 6, 8, 10, 12, 14, 16, 18, YY, BB, DD, FF, HH, 
JJ, LL, NN and/or PP, and encompasses an epitope of SECP such that an antibody raised against 
the peptide forms a specific immune complex with SECP. Preferably, the antigenic peptide 
comprises at least 6, 8, 10, 15, 20, or 30 amino acid residues. Longer antigenic peptides are 
10 sometimes preferable over shorter antigenic peptides, depending on use and according to 
methods well known to someone skilled in the art. 

In certain embodiments of the invention, at least one epitope encompassed by the 
antigenic peptide is a region of SECP that is located on the surface of the protein (e.g., a 
hydrophilic region). As a means for targeting antibody production, hydropathy plots showing 
1 5 regions of hydrophilicity and hydrophobicity may be generated by any method well known in the 
art, including, for example, the Kyte-DooHttle or the Hopp-Woods methods, either with or 
without Fourier transformation (see, e.g., Hopp and Woods, 1981. Proc. Nat Acad. Sci. USA 78: 
3824-3828; Kyte and Doolitde, 1982. J. Mol Biol 157: 105-142, each incorporated herein by 
reference in their entirety). 

20 As disclosed herein, SECP protein sequences of SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 

YY, BB, DD, FF, HH, JJ, LL, NN and/or PP, or derivatives, fragments, analogs, or homologs 
thereof, may be utilized as immunogens in the generation of antibodies that 
immunospecifically-bind these protein components. The term "antibody" as used herein refers to 
immunoglobulin molecules and immunologically-active portions of immunoglobulin molecules, 

25 I.e., molecules that contain an antigen binding site that specifically>binds (immunoreacts with) an 
antigen, such as SECP. Such antibodies include, but are not limited to, polyclonal, monoclonal, 
chimeric, single chain, Fab and F(ab')2 fragments, and an Fab expression library. In a specific 
embodiment, antibodies to human SECP proteins are disclosed. Various procedures known 
within the art may be used for the production of polyclonal or monoclonal antibodies to a SECP 

30 proteinsequenceofSEQIDNO:2,4, 6, 8, 10, 12, 14, 16, 18,41,43,45,47,49,51,53,55 
and/or 57, or a derivative, fragment, analog, or homolog thereof 

For the production of polyclonal antibodies, various suitable host animals (e.g^., rabbit, 

goat, mouse or other mammal) may be immunized by injection with the native protein, or a 

synthetic variant thereof, or a derivative of the foregoing. An appropriate immunogenic 
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preparation can contain, for example, recombinantly-expressed SECP protein or a chemically- 
sjmthesized SECP polypeptide. The preparation can further include an adjuvant. Various 
adjuvants used to increase the immunological response include, but are not limited to, Freund*s 
(complete and incomplete), mineral gels (e.g., aluminum hydroxide), surface active substances 
5 (e.g., lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, dinitrophenol, etc.), 
human adjuvants such as Bacille Calmette-Guerin and Corynebacterium parvum^ or similar 
immunostimulatory agents. If desired, the antibody molecules directed against SECP can be 
isolated from the mammal (eg., from the blood) and further purified by well known techniques, 
such as protein A chromatography to obtain the IgG fraction. 

10 The term "monoclonal antibody" or "monoclonal antibody composition", as used herein, 

refers to a population of antibody molecules that contain only one species of an antigen binding 
site capable of immunoreacting with a particular epitope of SECP. A monoclonal antibody 
composition thus typically displays a single binding affinity for a particular SECP protein with 
which it immunoreacts. For preparation of monoclonal antibodies directed towards a particular 

1 5 SECP protein, or derivatives, fragments, analogs or homologs thereof, any technique that 
provides for the production of antibody molecules by continuous cell line culture may be 
utilized. Such techniques include, but are not limited to, the hybridoma technique (see, e.g., 
Kohler & Milstein, 1975. Nature 256: 495-497); the trioma technique; the human B-cell 
hybridoma technique (see, e.g., Kozbor, et aL, 1983. Immunol. Today 4: 72) and the EBV 

20 hybridoma technique to produce human monoclonal antibodies (see, e.g.. Cole, et aL, 1985. In: 
Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). Human 
monoclonal antibodies may be utilized in the practice of the invention and may be produced by 
using human hybridomas (see, e.g.. Cote, et aL, 1983. Proc Natl Acad Sci USA 80: 2026-2030) 
or by transforming human B-cells with Epstein Barr Virus in vitro (see, e.g.. Cole, et aL, 1985. 

25 In: Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). Each of 
the above citations is incorporated herein by reference in their entirety. 

According to the invention, techniques can be adapted for the production of single-chain 
antibodies specific to a SECP protein (see, e.g., U.S. Patent No. 4,946,778). In addition, 
methods can be adapted for the construction of Fab expression libraries (see, e.g., Huse, et aL, 
30 1989. Science 246: 1275-1281) to allow rapid and effective identification of monoclonal Fab 
fragments with the desired specificity for a SECP protein or derivatives, fragments, analogs or 
homologs thereof Non-human antibodies can be "humanized" by techniques well known in the 
art. See, e.g., U.S. Patent No. 5,225,539. Antibody fragments that contain the idiotypes to a 
SECP protein may be produced by techniques known in the art including, but not limited to: 
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(/) an F(ab)2 fragment produced by pepsin digestion of an antibody molecule; (if) an Fab fragment 
generated by reducing the disulfide bridges of an F(ab')2 fragment; {in) an Fab fragment generated 
by the treatment of the antibody molecule with papain and a reducing agent and (/v) Fy 
fragments. 

5 Additionally, recombinant anti-SECP antibodies, such as chimeric and humanized 

monoclonal antibodies, comprising both human and non-human portions, which can be made 
using standard recombinant DNA techniques, are within the scope of the invention. Such 
chimeric and humanized monoclonal antibodies can be produced by recombinant DNA 
techniques known in the art, for example using methods described in International Application 

10 No. PCTAJS86/02269; European Patent Application No. 1 84, 1 87; European Patent Application 
No. 171,496; European Patent Application No. 173,494; PCT International Publication No. WO 
86/01533; U.S. Patent No. 4,816,567; U.S. Pat. No. 5,225,539; European Patent Application No. 
125,023; Better, et al., 1988. Science 240: 1041-1043; Liu, et aL, 1987. Proc. Natl. Acad. Sci. 
USA 84: 3439-3443; Liu, etaL, 1987. J. Immunol. 139: 3521-3526; Sun, etaL, 1987. Proc. Natl. 

15 Acad. Sci. USA 84: 214-218; Nishimura, et al, 1987. Cancer Res. 47: 999-1005; Wood, et aL, 

1985. NatureZU :446-449; Shaw, et al., 1988. y. Natl. Cancer Inst. 80: 1553-1559); 
Morrison(1985) Science 229:1202-1207; Oi, et al. (1986) BioTechniques 4:214; Jones, et al, 

1986. Nature 321 : 552-525; Verhoeyan, et al., 1988. Science 239: 1534; and Beidler, et al., 
1988. J. Immunol. 141: 4053-4060. Each of the above citations are incorporated herein by 

20 reference in their entirety. 

In one embodiment, methods for the screening of antibodies that possess the desired 
specificity include, but are not limited to, enzyme-linked immunosorbent assay (ELISA) and 
other immunologically-mediated techniques known within the art. In a specific embodiment, 
selection of antibodies that are specific to a particular domain of a SECP protein is facilitated by 
25 generation of hybridomas that bind to the fragment of a SECP protein possessing such a domain. 
Thus, antibodies that are specific for a desired domain within a SECP protein, or derivatives, 
fragments, analogs or homologs thereof, are also provided herein. 

Anti-SECP antibodies may be used in methods known within the art relating to the 
localization and/or quantitation of a SECP protein {e.g., for use in measuring levels of the SECP 
30 protein within appropriate physiological samples, for use in diagnostic methods, for use in 
imaging the protein, and the like). In a given embodiment, antibodies for SECP proteins, or 
derivatives, fragments, analogs or homologs thereof, that contain the antibody derived binding 
domain, are utilized as pharmacologically-active compounds (hereinafter "Therapeutics"). 
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An anti-SECP antibody (e.g., monoclonal antibody) can be used to isolate a SECP 
polypeptide by standard techniques, such as affinity chromatography or immunoprecipitation. 
An anti-SECP antibody can facilitate the purification of natural SECP polypeptide from cells and 
of recombinantly-produced SECP polypeptide expressed in host cells. Moreover, an anti-SECP 
5 antibody can be used to detect SECP protein (e.g., in a cellular lysatc or cell supernatant) in order 
to evaluate the abundance and pattern of expression of the SECP protein. Anti-SECP antibodies 
can be used diagnostically to monitor protein levels in tissue as part of a clinical testing 
procedure, e.g., to, for example, determine the efficacy of a given treatment regimen. Detection 
can be facilitated by coupling (i.e., physically linking) the antibody to a detectable substance. 

10 Examples of detectable substances include various enzymes, prosthetic groups, fluorescent 

materials, luminescent materials, bioluminescent materials, and radioactive materials. Examples 
of suitable enzymes include horseradish peroxidase, alkaline phosphatase, p-galactosidase, or 
acetylcholinesterase; examples of suitable prosthetic group complexes include streptavidin/biotin 
and avidin/biotin; examples of suitable fluorescent materials include umbelliferone, fluorescein, 

15 fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or 
phycoerythrin; an example of a luminescent material includes luminol; examples of 
bioluminescent materials include luciferase, luciferin, and aequorin, and examples of suitable 
radioactive material include ^^^I, ^^S or "'H. 

SECP Recombinant Expression Vectors and Host Cells 

20 Another aspect of the invention pertains to vectors, preferably expression vectors, 

containing a nucleic acid encoding a SECP protein, or derivatives, fragments, analogs or 
homologs thereof. As used herein, the term "vector" refers to a nucleic acid molecule capable of 
transporting another nucleic acid to which it has been linked. One type of vector is a "plasmid", 
which refers to a circular double stranded DNA loop into which additional DNA segments can 

25 be ligated. Another type of vector is a viral vector, wherein additional DNA segments can be 
ligated into the viral genome. Certain vectors are capable of autonomous replication in a host 
cell into which they are introduced (e.g., bacterial vectors having a bacterial origin of replication 
and episomal mammalian vectors). Other vectors (e.g., non-episomal mammalian vectors) are 
integrated into the genome of a host cell upon introduction into the host cell, and thereby are 

30 replicated along with the host genome. Moreover, certain vectors are capable of directing the 

expression of genes to which they are operatively-linked. Such vectors are referred to herein as 

"expression vectors". In general, expression vectors of utility in recombinant DNA techniques 

are often in the form of plasmids. In the present Specification, "plasmid" and "vector" can be 

used interchangeably, as the plasmid is the most commonly used form of vector. However, the 
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invention is intended to include such other forms of expression vectors, such as viral vectors 
ie,g,y replication defective retroviruses, adenoviruses and adeno-associated viruses), which serve 
equivalent functions. 

The recombinant expression vectors of the invention comprise a nucleic acid of the 
5 invention in a form suitable for expression of the nucleic acid in a host cell, which means that the 
recombinant expression vectors include one or more regulatory sequences, selected on the basis 
of the host cells to be used for expression, that is operatively-linked to the nucleic acid sequence 
to be expressed. Within a recombinant expression vector, "operably-linked" is intended to mean 
that the nucleotide sequence of interest is linked to the regulatory sequence(s) in a manner that 
10 allows for expression of the nucleotide sequence (e.g,, in an in vitro transcription/translation 
system or in a host cell when the vector is introduced into the host cell). 

The phrase "regulatory sequence" is intended to includes promoters, enhancers and other 
expression control elements (e.g., polyadenylation signals). Such regulatory sequences are 
described, for example, in Goeddel, Gene Expression Technology: Methods in 

15 Enzymology 185, Academic Press, San Diego, Calif (1990). Regulatory sequences include 
those that direct constitutive expression of a nucleotide sequence in many types of host cell and 
those that direct expression of the nucleotide sequence only in certain host cells (e.g.y 
tissue-specific regulatory sequences). It will be appreciated by those skilled in the art that the 
design of the expression vector can depend on such factors as the choice of the host cell to be 

20 transformed, the level of expression of protein desired, etc. The expression vectors of the 

invention can be introduced into host cells to thereby produce proteins or peptides, including 
fusion proteins or peptides, encoded by nucleic acids as described herein (e.g., SECP proteins, 
mutant forms of SECP proteins, fusion proteins, etc.). 

The recombinant expression vectors of the invention can be designed for expression of 
25 SECP proteins in prokaryotic or eukaryotic cells. For example, SECP proteins can be expressed 
in bacterial cells such as Escherichia coli, insect cells (using baculovirus expression vectors) 
yeast cells or mammalian cells. Suitable host cells are discussed further in Goeddel, Gene 
Expression Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif 
(1 990). Alternatively, the recombinant expression vector can be transcribed and translated in 
30 vitro, for example using T7 promoter regulatory sequences and T7 polymerase. 

Expression of proteins in prokaryotes is most often carried out in Escherichia coli with 
vectors containing constitutive or inducible promoters directing the expression of either fusion or 
non-fusion proteins. Fusion vectors add a number of amino acids to a protein encoded therein, 
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usually to the amino terminus of the recombinant protein. Such fusion vectors typically serve 
three purposes: (/) to increase expression of recombinant protein; (//) to increase the solubility of 
the recombinant protein; and (///) to aid in the purification of the recombinant protein by acting 
as a ligand in affinity purification. Often, in fusion expression vectors, a proteolytic cleavage 
5 site is introduced at the junction of the fusion moiety and the recombinant protein to enable 
separation of the recombinant protein from the fusion moiety subsequent to purification of the 
fusion protein. Such enzymes, and their cognate recognition sequences, include Factor Xa, 
thrombin, and enterokinase. Typical fusion expression vectors include pGEX (Pharmacia 
Biotech Inc; Smith and Johnson, 1988. Gene 67: 31-40), pMAL (New England Biolabs, Beverly, 
10 Mass.) and pRIT5 (Pharmacia, Piscataway, N.J.) that fuse glutathione S-transferase (GST), 
maltose E binding protein, or protein A, respectively, to the target recombinant protein. 

Examples of suitable inducible non-fusion Escherichia coli expression vectors include 
pTrc (Amrann et al„ (1988) Gene 69:301-315) and pET 1 Id (Studier, et al., GENE EXPRESSION 
Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif (1990) 
15 60-89). 

One strategy to maximize recombinant protein expression in Escherichia coli is to 
express the protein in a host bacteria with an impaired capacity to proteolytically-cleave the 
recombinant protein. See, e,g,, Gottesman, GENE EXPRESSION TECHNOLOGY: METHODS IN 
Enzymology 185, Academic Press, San Diego, Calif (1990) 1 19-128. Another strategy is to 
20 alter the nucleic acid sequence of the nucleic acid to be inserted into an expression vector so that 
the individual codons for each amino acid are those preferentially utilized in Escherichia coli 
(see, e.g., Wada, et al,, 1992. NucL Acids Res. 20: 2111-2118). Such alteration of nucleic acid 
sequences of the invention can be carried out by standard DNA synthesis techniques. 

In another embodiment, the SECP expression vector is a yeast expression vector. 
25 Examples of vectors for expression in yeast Saccharomyces cerivisae include pYepSecl 

(Baldari, et aL, 1987. EMBO J. 6: 229-234), pMFa (Kurjan and Herskowitz, 1982. Cell 30: 
933-943), pJRY88 (Schultz et aL, 1987. Gene 54: 1 13-123), pYES2 (Invitrogen Corporation, 
San Diego, Calif.), and picZ (InVitrogen, Corp.; San Diego, Calif.). 

Alternatively, SECP can be expressed in insect cells using baculovirus expression 
30 vectors. Baculovirus vectors available for expression of proteins in cultured insect cells {e.g., 

SF9 cells) include the pAc series (Smith, et aL, 1983. MoL CelL BioL 3: 2156-2165) and the pVL 
series (Lucklow and Summers, 1989. Virology 170: 31-39). 
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In yet another embodiment, a nucleic acid of the invention is expressed in mammalian 
cells using a mammalian expression vector. Examples of mammalian expression vectors include 
pCDM8 (Seed, 1987. Nature 329: 840) and pMT2PC (Kaufman, et aL, 1987. EMBOJ, 6: 
1 87-195). When used in mammalian cells, the expression vector's control functions are often 
5 provided by viral regulatory elements. For example, commonly used promoters are derived from 
polyoma, adenovirus 2, cytomegalovirus, and simian virus 40 (SV 40). For other suitable 
expression systems for both prokaryotic and eukaryotic cells see, e.g,. Chapters 16 and 17 of 
Sambrook, et al.. Molecular Cloning: A Laboratory Manual. 2nd ed., Cold Spring Harbor 
Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989. 

10 In another embodiment, the recombinant mammalian expression vector is capable of 

directing expression of the nucleic acid preferentially in a particular cell type {e.g., 
tissue-specific regulatory elements are used to express the nucleic acid). Tissue-specific 
regulatory elements are known in the art. Non-limiting examples of suitable tissue-specific 
promoters include the albumin promoter (liver-specific; see, Pinkert, et aL, 1987. Genes Dev, 1 : 

15 268-277), lymphoid-specific promoters {see, Calame and Eaton, 1988. Adv. Immunol. 43: 

235-275), in particular promoters of T cell receptors {see, Winoto and Baltimore, 1989. EMBOJ, 
8: 729-733) and immunoglobulins {see, Banerji, et aL, 1983. Cell 33: 729-740; Queen and 
Baltimore, 1983. Cell 33: 741-748), neuron-specific promoters (e.g^., the neurofilament promoter; 
see, Byrne and Ruddle, 1989. Proc. Natl. Acad. ScL USA 86: 5473-5477), pancreas-specific 

20 promoters {see, Edlund, et aL, 1985. Science 230: 912-916), and mammary gland-specific 
promoters {e.g., milk whey promoter; U.S. Pat. No. 4,873,316 and European Application 
Publication No. 264,166). Developmentally-regulated promoters are also encompassed, e.g., the 
murine hox promoters (Kessel and Gruss, 1990. Science 249: 374-379) and the a-fetoprotein 
promoter (5ree, Campes and Tilghman, 1989. Genes Dev. 3: 537-546). 

25 The invention further provides a recombinant expression vector comprising a DNA 

molecule of the invention cloned into the expression vector in an antisense orientation. That is, 
the DNA molecule is operatively-linked to a regulatory sequence in a manner that allows for 
expression (by transcription of the DNA molecule) of an RNA molecule that is antisense to 
SECP mRNA. Regulatory sequences operatively linked to a nucleic acid cloned in the antisense 

30 orientation can be chosen that direct the continuous expression of the antisense RNA molecule in 
a variety of cell types, for instance viral promoters and/or enhancers, or regulatory sequences can 
be chosen that direct constitutive, tissue specific or cell type specific expression of antisense 
RNA. The antisense expression vector can be in the form of a recombinant plasmid, phagemid 
or attenuated virus in which antisense nucleic acids are produced under the control of a high 
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efficiency regulatory region, the activity of which can be determined by the cell type into which 
the vector is introduced. For a discussion of the regulation of gene expression using antisense 
genes see, e.g., Weintraub, et al., "Antisense RNA as a molecular tool for genetic analysis," 
Reviews-Trends in Genetics^ Vol. 1(1) 1986. 

5 Another aspect of the invention pertains to host cells into which a recombinant 

expression vector of the invention has been introduced. The terms "host cell" and "recombinant 
host cell" are used interchangeably herein. It is understood that such terms refer not only to the 
particular subject cell but also to the progeny or potential progeny of such a cell. Because certain 
modifications may occur in succeeding generations due to either mutation or environmental 
10 influences, such progeny may not, in fact, be identical to the parent cell, but are still included 
within the scope of the term as used herein. 

A host cell can be any prokaryotic or eukaryotic cell. For example, SECP protein can be 
expressed in bacterial cells such as Escherichia coli, insect cells, yeast or mammalian cells (such 
as Chinese hamster ovary cells (CHO) or COS cells). Other suitable host cells are known to 
15 those skilled in the art. 

Vector DNA can be introduced into prokaryotic or eukaryotic cells via conventional 
transformation or transfection techniques. As used herein, the terms "transformation" and 
"transfection" are intended to refer to a variety of art-recognized techniques for introducing 
foreign nucleic acid {e.g., DNA) into a host cell, including calcium phosphate or calcium 
20 chloride co-precipitation, DEAE-dextran-mediated transfection, lipofection, or electroporation. 
Suitable methods for transforming or transfecting host cells can be found in Sambrook, et al. 
(Molecular Cloning: A Laboratory Manual. 2nd ed.. Cold Spring Harbor Laboratory, 
Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989), and other laboratory 
manuals. 

25 For stable transfection of mammalian cells, it is known that, depending upon the 

expression vector and transfection technique used, only a small fraction of cells may integrate 
the foreign DNA into their genome. In order to identify and select these integrants, a gene that 
encodes a selectable marker {e.g.^ resistance to antibiotics) is generally introduced into the host 
cells along with the gene of interest. Various selectable markers include those that confer 

30 resistance to drugs, such as G418, hygromycin and methotrexate. Nucleic acid encoding a 

selectable marker can be introduced into a host cell on the same vector as that encoding SECP or 
can be introduced on a separate vector. Cells stably-transfected with the introduced nucleic acid 
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can be identified by drug selection {e,g,, cells that have incorporated the selectable marker gene 
will survive, while the other cells die). 

A host cell of the invention, such as a prokaryotic or eukaryotic host cell in culture, can 
be used to produce (i.e., express) SECP protein. Accordingly, the invention further provides 
5 methods for producing SECP protein using the host cells of the invention. In one embodiment, 
the method comprises culturing the host cell of invention (i.e., into which a recombinant 
expression vector encoding SECP protein has been introduced) in a suitable medium such that 
SECP protein is produced. In another embodiment, the method further comprises isolating 
SECP protein from the medium or the host cell. 

1 0 Transgenic Animals 

The host cells of the invention can also be used to produce non-human transgenic 
animals. For example, in one embodiment, a host cell of the invention is a fertilized oocyte or an 
embryonic stem cell into which SECP protein-coding sequences have been introduced. These 
host cells can then be used to create non-human transgenic animals in which exogenous SECP 

15 sequences have been introduced into their genome or homologous recombinant animals in which 
endogenous SECP sequences have been altered. Such animals are useful for studying the 
function and/or activity of SECP protein and for identifying and/or evaluating modulators of 
SECP protein activity. As used herein, a "transgenic animal" is a non-human animal, preferably 
a mammal, more preferably a rodent such as a rat or mouse, in which one or more of the cells of 

20 the animal includes a transgene. Other examples of transgenic animals include non-human 
primates, sheep, dogs, cows, goats, chickens, amphibians, etc. 

A transgene is exogenous DNA that is integrated into the genome of a cell from which a 
transgenic animal develops and that remains in the genome of the mature animal, thereby 
directing the expression of an encoded gene product in one or more cell types or tissues of the 
25 transgenic animal. As used herein, a "homologous recombinant animal" is a non-human animal, 
preferably a mammal, more preferably a mouse, in which an endogenous SECP gene has been 
altered by homologous recombination between the endogenous gene and an exogenous DNA 
molecule introduced into a cell of the animal, e.g., an embryonic cell of the animal, prior to 
development of the animal. 

30 A transgenic animal of the invention can be created by introducing SECP-encoding 

nucleic acid into the male pronuclei of a fertilized oocyte (e.g., by micro-injection, retroviral 
infection) and allowing the oocyte to develop in a pseudopregnant female foster animal. The 
human SECP cDNA sequences of SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 
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52, 54 and/or 56 can be introduced as a transgene into the genome of a non-human animal. 
Alternatively, a non-human homologue of the human SECP gene, such as a mouse SECP gene, 
can be isolated based on hybridization to the human SECP cDNA (described further supra) and 
used as a transgene. Intronic sequences and polyadenylation signals can also be included in the 
5 transgene to increase the efficiency of expression of the transgene. A tissue-specific regulatory 
sequence(s) can be operably-linked to the SECP transgene to direct expression of SECP protein 
to particular cells. Methods for generating transgenic animals via embryo manipulation and 
micro-injection, particularly animals such as mice, have become conventional in the art and are 
described, for example, in U.S. Patent Nos. 4,736,866; 4,870,009; and 4,873,191; and Hogan, 

10 1986. In: Manipulating the Mouse Embryo, Cold Spring Harbor Laboratory Press, Cold 
Spring Harbor, N.Y. Similar methods are used for production of other transgenic animals. A 
transgenic founder animal can be identified based upon the presence of the SECP transgene in its 
genome and/or expression of SECP mRNA in tissues or cells of the animals. A transgenic 
founder animal can then be used to breed additional animals carrying the transgene. Moreover, 

1 5 transgenic animals carrying a transgene-encoding SECP protein can further be bred to other 
transgenic animals carrying other transgenes. 

To create a homologous recombinant animal, a vector is prepared which contains at least 
a portion of a SECP gene into which a deletion, addition or substitution has been introduced to 
thereby alter, e.g., functionally disrupt, the SECP gene. The SECP gene can be a human gene 

20 (e.^., the cDNA of SEQ ID NO: 1 , 3, 5, 7, 9, 1 1 , 1 3, 1 5, 1 7, 40, 42, 44, 46, 48, 50, 52, 54 and 56), 
but more preferably, is a non-human homologue of a human SECP gene. For example, a mouse 
homologue of human SECP gene of SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 
50, 52, 54 and 56 can be used to construct a homologous recombination vector suitable for 
altering an endogenous SECP gene in the mouse genome. In one embodiment, the vector is 

25 designed such that, upon homologous recombination, the endogenous SECP gene is functionally 
disrupted (i.e., no longer encodes a functional protein; also referred to as a "knock out" vector). 

Alternatively, the vector can be designed such that, upon homologous recombination, the 
endogenous SECP gene is mutated or otherwise altered but still encodes functional protein (e.g., 
the upstream regulatory region can be altered to thereby alter the expression of the endogenous 
30 SECP protein). In the homologous recombination vector, the altered portion of the SECP gene is 
flanked at its 5'- and 3*-tennini by additional nucleic acid of the SECP gene to allow for 
homologous recombination to occur between the exogenous SECP gene carried by the vector 
and an endogenous SECP gene in an embryonic stem cell. The additional flanking SECP nucleic 
acid is of sufficient length for successful homologous recombination with the endogenous gene. 
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Typically, several kilobases (Kb) of flanking DNA (both at the 5'- and S'-temiini) are included in 
the vector. See, e.g., Thomas, etaL, 1987. Cell 51: 503 for a description of homologous 
recombination vectors. The vector is ten introduced into an embryonic stem cell line {e.g., by 
electroporation) and cells in which the introduced SECP gene has homologously-recombined 
5 with the endogenous SECP gene are selected. See, e.g., Li, et al., 1992. Cell 69: 915. 

The selected cells are then micro-injected into a blastocyst of an animal (e.g., a mouse) to 
form aggregation chimeras. See, e.g., Bradley, 1987. In: Teratocarcinomas and Embryonic 
Stem Cells: A Practical Approach, Robertson, ed. IRL, Oxford, pp. 1 13-1 52. A chimeric 
embryo can then be implanted into a suitable pseudopregnant female foster animal and the 

10 embryo brought to term. Progeny harboring the homologously-recombined DNA in their germ 
cells can be used to breed animals in which all cells of the animal contain the homologously- 
recombined DNA by germline transmission of the transgene. Methods for constructing 
homologous recombination vectors and homologous recombinant animals are described further 
in Bradley, 1991. Curr. Opin. BiotechnoL 2: 823-829; PCT International Publication Nos.: WO 

15 90/1 1354; WO 91/01 140; WO 92/0968; and WO 93/04169. 

In another embodiment, transgenic non-human animals can be produced that contain 
selected systems that allow for regulated expression of the transgene. One example of such a 
system is the cre/loxP recombinase system of bacteriophage PI . For a description of the cre/loxP 
recombinase system, See, e.g., Lakso, et aL, 1992. Proc. Natl. Acad. ScL USA 89: 6232-6236. 

20 Another example of a recombinase system is the FLP recombinase system of Saccharomyces 
cerevisiae. See, O'Gorman, et aL, \99\. Science 251:1351-1355. If a cre/loxP recombinase 
system is used to regulate expression of the transgene, animals containing transgenes encoding 
both the Cre recombinase and a selected protein are required. Such animals can be provided 
through the construction of "double" transgenic animals, e.g., by mating two transgenic animals, 

25 one containing a transgene encoding a selected protein and the other containing a transgene 
encoding a recombinase. 

Clones of the non-human transgenic animals described herein can also be produced 
according to the methods described in Wilmut, et al., 1997. Nature 385: 810-813. In brief, a cell 
{e.g., a somatic cell) from the transgenic animal can be isolated and induced to exit the growth 
30 cycle and enter Go phase. The quiescent cell can then be fused, e.g., through the use of electrical 
pulses, to an enucleated oocyte from an animal of the same species from which the quiescent cell 
is isolated. The reconstructed oocyte is then cultured such that it develops to morula or 
blastocyte and then transferred to pseudopregnant female foster animal. The offspring borne of 



172 



Oi U «'*-L'^.i« £m & „t> L.P sr^"! 

this female foster animal will be a clone of the animal from which the cell (e.g., the somatic cell) 
is isolated. 

Pharmaceutical Compositions 

The SECP nucleic acid molecules, SECP proteins, and anti-SECP antibodies (also 
5 referred to herein as "active compounds") of the invention, and derivatives, fragments, analogs 
and homologs thereof, can be incorporated into pharmaceutical compositions suitable for 
administration. Such compositions typically comprise the nucleic acid molecule, protein, or 
antibody and a pharmaceutically-acceptable carrier. As used herein, "pharmaceutically- 
acceptable carrier" is intended to include any and all solvents, dispersion media, coatings, 

10 antibacterial and antifungal agents, isotonic and absorption delaying agents, and the like, 
compatible with pharmaceutical administration. Suitable carriers are described in the most 
recent edition of Remington's Pharmaceutical Sciences, a standard reference text in the field, 
which is incorporated herein by reference. Preferred examples of such carriers or diluents 
include, but are not limited to, water, saline, finger's solutions, dextrose solution, and 5% human 

15 serum albumin. Liposomes and other non-aqueous (/.e, lipophilic) vehicles such as fixed oils 
may also be used. The use of such media and agents for pharmaceutically active substances is 
well known in the art. Except insofar as any conventional media or agent is incompatible with 
the active compound, use thereof in the compositions is contemplated. Supplementary active 
compounds can also be incorporated into the compositions. 

20 A pharmaceutical composition of the invention is formulated to be compatible with its 

intended route of administration. Examples of routes of administration include parenteral, e,g.y 
intravenous, intradermal, subcutaneous, oral (e.g., inhalation), transdermal (i.e., topical), 
transmucosal, and rectal administration. Solutions or suspensions used for parenteral, 
intradermal, or subcutaneous application can include the following components: a sterile diluent 

25 such as water for injection, saline solution, fixed oils, polyethylene glycols, glycerine, propylene 
glycol or other synthetic solvents; antibacterial agents such as benzyl alcohol or methyl 
parabens; antioxidants such as ascorbic acid or sodium bisulfite; chelating agents such as 
ethylenediaminetetraacetic acid (EDTA); buffers such as acetates, citrates or phosphates, and 
agents for the adjustment of tonicity such as sodium chloride or dextrose. The pH can be 

30 adjusted with acids or bases, such as hydrochloric acid or sodium hydroxide. The parenteral 
preparation can be enclosed in ampoules, disposable syringes or multiple dose vials made of 
glass or plastic. 
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Pharmaceutical compositions suitable for injectable use include sterile aqueous solutions 
(where water soluble) or dispersions and sterile powders for the extemporaneous preparation of 
sterile injectable solutions or dispersion. For intravenous administration, suitable carriers 
include physiological saline, bacteriostatic water, Cremophor EL^"" (BASF, Parsippany, N.J.) or 
phosphate buffered saline (PBS). In all cases, the composition must be sterile and should be 
fluid to the extent that easy syringeability exists. It must be stable under the conditions of 
manufacture and storage and must be preserved against the contaminating action of 
microorganisms such as bacteria and fungi. The carrier can be a solvent or dispersion medium 
containing, for example, water, cthanol, polyol (for example, glycerol, propylene glycol, and 
liquid polyethylene glycol, and the like), and suitable mixtures thereof. The proper fluidity can 
be maintained, for example, by the use of a coating such as lecithin, by the maintenance of the 
required particle size in the case of dispersion and by the use of surfactants. Prevention of the 
action of microorganisms can be achieved by various antibacterial and antifungal agents, for 
example, parabens, chlorobutanol, phenol, ascorbic acid, thimerosal, and the like. In many 
cases, it will be preferable to include isotonic agents, for example, sugars, polyalcohols such as 
manitol, sorbitol, sodium chloride in the composition. Prolonged absorption of the injectable 
compositions can be brought about by including in the composition an agent which delays 
absorption, for example, aluminum monostearate and gelatin. 

Sterile injectable solutions can be prepared by incorporating the active compound (e.g., a 
SECP protein or anti-SECP antibody) in the required amount in an appropriate solvent with one 
or a combination of ingredients enumerated above, as required, followed by filtered sterilization. 
Generally, dispersions are prepared by incorporating the active compound into a sterile vehicle 
that contains a basic dispersion medium and the required other ingredients from those 
enumerated above. In the case of sterile powders for the preparation of sterile injectable 
solutions, methods of preparation are vacuum drying and freeze-drying that yields a powder of 
the active ingredient plus any additional desired ingredient from a previously sterile-filtered 
solution thereof. 

Oral compositions generally include an inert diluent or an edible carrier. They can be 
enclosed in gelatin capsules or compressed into tablets. For the purpose of oral therapeutic 
administration, the active compound can be incorporated with excipients and used in the form of 
tablets, troches, or capsules. Oral compositions can also be prepared using a fluid carrier for use 
as a mouthwash, wherein the compound in the fluid carrier is applied orally and swished and 
expectorated or swallowed. Pharmaceutically compatible binding agents, and/or adjuvant 
materials can be included as part of the composition. The tablets, pills, capsules, troches and the 
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like can contain any of the following ingredients, or compounds of a similar nature: a binder 
such as microcrystalline cellulose, gum tragacanth or gelatin; an excipient such as starch or 
lactose, a disintegrating agent such as alginic acid, Primogel, or com starch; a lubricant such as 
magnesium stearate or Sterotes; a glidant such as colloidal silicon dioxide; a sweetening agent 
such as sucrose or saccharin; or a flavoring agent such as peppermint, methyl salicylate, or 
orange flavoring. 

For administration by inhalation, the compounds are delivered in the form of an aerosol 
spray from pressured container or dispenser which contains a suitable propellant, e.g., a gas such 
as carbon dioxide, or a nebulizer. 

Systemic administration can also be by transmucosal or transdermal means. For 
transmucosal or transdermal administration, penetrants appropriate to the barrier to be permeated 
are used in the formulation. Such penetrants are generally known in the art, and include, for 
example, for transmucosal administration, detergents, bile salts, and fusidic acid derivatives. 
Transmucosal administration can be accomplished through the use of nasal sprays or 
suppositories. For transdermal administration, the active compounds are formulated into 
ointments, salves, gels, or creams as generally known in the art. 

The compounds can also be prepared in the form of suppositories (e.^., with conventional 
suppository bases such as cocoa butter and other glycerides) or retention enemas for rectal 
delivery. 

In one embodiment, the active compounds are prepared with carriers that will protect the 
compound against rapid elimination from the body, such as a controlled release formulation, 
including implants and microencapsulated delivery systems. Biodegradable, biocompatible 
polymers can be used, such as ethylene vinyl acetate, polyanhydrides, polyglycolic acid, 
collagen, polyorthoesters, and polylactic acid. Methods for preparation of such formulations will 
be apparent to those skilled in the art. The materials can also be obtained commercially from 
Alza Corporation and Nova Pharmaceuticals, Inc. Liposomal suspensions (including liposomes 
targeted to infected cells with monoclonal antibodies to viral antigens) can also be used as 
pharmaceutically acceptable carriers. These can be prepared according to methods known to 
those skilled in the art, for example, as described in U.S. Patent No. 4,522,81 1 . 

It is especially advantageous to formulate oral or parenteral compositions in dosage unit 
form for ease of administration and uniformity of dosage. Dosage unit form as used herein refers 
to physically discrete units suited as unitary dosages for the subject to be treated; each unit 
containing a predetermined quantity of active compound calculated to produce the desired 
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therapeutic effect in association with the required pharmaceutical carrier. The specification for 
the dosage unit forms of the invention are dictated by and directly dependent on the unique 
characteristics of the active compound and the particular therapeutic effect to be achieved, and 
the limitations inherent in the art of compounding such an active compound for the treatment of 
5 individuals. 

The nucleic acid molecules of the invention can be inserted into vectors and used as gene 
therapy vectors. Gene therapy vectors can be delivered to a subject by, for example, intravenous 
injection, local administration (see, e.g., U.S. Patent No. 5,328,470) or by stereotactic injection 
(see, e.g., Chen, et al., 1994. Proc. Natl. Acad. Sci, USA 91: 3054-3057). The pharmaceutical 
10 preparation of the gene therapy vector can include the gene therapy vector in an acceptable 

diluent, or can comprise a slow release matrix in which the gene delivery vehicle is imbedded. 
Alternatively, where the complete gene delivery vector can be produced intact from recombinant 
cells, e.g., retroviral vectors, the pharmaceutical preparation can include one or more cells that 
produce the gene delivery system. 

15 The pharmaceutical compositions can be included in a container, pack, or dispenser 

together with instructions for administration. 

Screening and Detection Methods 

The nucleic acid molecules, proteins, protein homologues, and antibodies described 
herein can be used in one or more of the following methods: (A) screening assays; (B) detection 
20 assays (e.g., chromosomal mapping, cell and tissue typing, forensic biology), (C) predictive 
medicine (e.g.^ diagnostic assays, prognostic assays, monitoring clinical trials, and 
pharmacogenomics); and (D) methods of treatment (e.g., therapeutic and prophylactic). 

The isolated nucleic acid molecules of the present invention can be used to express SECP 
protein (e.g., via a recombinant expression vector in a host cell in gene therapy applications), to 
25 detect SECP mRNA (e.g., in a biological sample) or a genetic lesion in an SECP gene, and to 

modulate SECP activity, as described further below. In addition, the SECP proteins can be used 
to screen drugs or compounds that modulate the SECP protein activity or expression as well as to 
treat disorders characterized by insufficient or excessive production of SECP protein or 
production of SECP protein forms that have decreased or aberrant activity compared to SECP 
30 wild-type protein. In addition, the anti-SECP antibodies of the present invention can be used to 
detect and isolate SECP proteins and modulate SECP activity. 

The invention further pertains to novel agents identified by the screening assays 
described herein and uses thereof for treatments as previously described. 
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Screening Assays 

The invention provides a method (also referred to herein as a "screening assay") for 
identifying modulators, i.e., candidate or test compounds or agents (e.^., peptides, 
peptidomimetics, small molecules or other drugs) that bind to SECP proteins or have a 
5 stimulatory or inhibitory effect on, e.g„ SECP protein expression or SECP protein activity. The 
invention also includes compounds identified in the screening assays described herein. 

In one embodiment, the invention provides assays for screening candidate or test 
compounds which bind to or modulate the activity of the membrane-bound form of a SECP 
protein or polypeptide or biologically-active portion thereof The test compounds of the 

10 invention can be obtained using any of the numerous approaches in combinatorial library 
methods known in the art, including: biological libraries; spatially addressable parallel solid 
phase or solution phase libraries; synthetic library methods requiring deconvolution; the 
"one-bead one-compound" library method; and synthetic library methods using affinity 
chromatography selection. The biological library approach is limited to peptide libraries, while 

1 5 the other four approaches are applicable to peptide, non-peptide oligomer or small molecule 
libraries of compounds. See, e.g.. Lam, 1997. Anticancer Drug Design 12: 145. 

A "small molecule" as used herein, is meant to refer to a composition that has a molecular 
weight of less than about 5 kD and most preferably less than about 4 kD. Small molecules can 
be, e.g., nucleic acids, peptides, polypeptides, peptidomimetics, carbohydrates, lipids or other 
20 organic or inorganic molecules. Libraries of chemical and/or biological mixtures, such as fungal, 
bacterial, or algal extracts, are known in the art and can be screened with any of the assays of the 
invention. 

Examples of methods for the synthesis of molecular libraries can be found in the art, for 
example in: DeWitt, et al., 1993. Proc, Natl. Acad. Sci, U.S.A. 90: 6909; Erb, et al., 1994. Proc. 
25 NatL Acad. Sci. U.S.A. 91 : 1 1422; Zuckermann, et al., 1994. J. Med. Chem. 37: 2678; Cho, et aL, 
1993. Science 261: 1303; Carrell, et al., 1994. Angew. Chem. Int. Ed. Engl 33: 2059; Carell, et 
al, 1 994. Angew. Chem. Int. Ed. Engl. 33: 2061 ; and Gallop, et aL, 1994. J. Med, Chem. 37: 
1233. 

Libraries of compounds may be presented in solution {e.g., Houghten, 1992. 
30 Biotechniques 13: 412-421), or on beads (Lam, 1991, Nature 354: 82-84), on chips (Fodor, 1993. 
Nature 364: 555-556), bacteria (Ladner, U.S. Patent No. 5,223,409), spores (Ladner, U.S. Patent 
5,233,409), plasmids (Cull, et aL, 1992. Proc. Natl Acad. Sci. USA 89: 1865-1869) or on phage 
(Scott and Smith, 1990. Science 249: 386-390; Devlin, 1990. Science 249: 404-406; Cwirla, et 
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a/., 1990. Proc, Natl. Acad. Sci. U.S.A. 87: 6378-6382; Felici, 1991. y. Mol. Biol 222: 301-310; 
Ladner, U.S. Patent No. 5,233,409.). 

In one embodiment, an assay is a cell-based assay in which a cell which expresses a 
membrane-bound form of SECP protein, or a biologically-active portion thereof, on the cell 
5 surface is contacted with a test compound and the ability of the test compound to bind to a SECP 
protein determined. The cell, for example, can of mammalian origin or a yeast cell. 
Determining the ability of the test compound to bind to the SECP protein can be accomplished, 
for example, by coupling the test compound with a radioisotope or enzymatic label such that 
binding of the test compound to the SECP protein or biologically-active portion thereof can be 

10 determined by detecting the labeled compound in a complex. For example, test compounds can 
be labeled with '^^I, ^^S, *^C, or ^H, either directly or indirectly, and the radioisotope detected by 
direct counting of radioemission or by scintillation counting. Alternatively, test compounds can 
be enzymatically-labeled with, for example, horseradish peroxidase, alkaline phosphatase, or 
luciferase, and the enzymatic label detected by determination of conversion of an appropriate 

15 substrate to product. In one embodiment, the assay comprises contacting a cell which expresses 
a membrane-bound form of SECP protein, or a biologically-active portion thereof, on the cell 
surface with a known compound which binds SECP to form an assay mixture, contacting the 
assay mixture with a test compound, and determining the ability of the test compound to interact 
with a SECP protein, wherein determining the ability of the test compound to interact with a 
20 SECP protein comprises determining the ability of the test compound to preferentially bind to 
SECP protein or a biologically-active portion thereof as compared to the known compound. 

In another embodiment, an assay is a cell-based assay comprising contacting a cell 
expressing a membrane-bound form of SECP protein, or a biologically-active portion thereof, on 
the cell surface with a test compound and determining the ability of the test compound to 

25 modulate {e.g., stimulate or inhibit) the activity of the SECP protein or biologically-active 

portion thereof Determining the ability of the test compound to modulate the activity of SECP 
or a biologically-active portion thereof can be accomplished, for example, by determining the 
ability of the SECP protein to bind to or interact with a SECP target molecule. As used herein, a 
"target molecule" is a molecule with which a SECP protein binds or interacts in nature, for 

30 example, a molecule on the surface of a cell which expresses a SECP interacting protein, a 
molecule on the surface of a second cell, a molecule in the extracellular milieu, a molecule 
associated with the internal surface of a cell membrane or a cytoplasmic molecule. An SECP 
target molecule can be a non-SECP molecule or a SECP protein or polypeptide of the invention. 
In one embodiment, a SECP target molecule is a component of a signal transduction pathway 
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that facilitates transduction of an extracellular signal (e.g, a signal generated by binding of a 
compound to a membrane-bound SECP molecule) through the cell membrane and into the cell. 
The target, for example, can be a second intercellular protein that has catalytic activity or a 
protein that facilitates the association of downstream signaling molecules with SECP. 

5 Determining the ability of the SECP protein to bind to or interact with a SECP target 

molecule can be accomplished by one of the methods described above for determining direct 
binding. In one embodiment, determining the ability of the SECP protein to bind to or interact 
with a SECP target molecule can be accomplished by determining the activity of the target 
molecule. For example, the activity of the target molecule can be determined by detecting 
10 induction of a cellular second messenger of the target (i.e. intracellular Ca^"^, diacylglycerol, IP3, 
etc.), detecting catalytic/enzymatic activity of the target an appropriate substrate, detecting the 
induction of a reporter gene (comprising a SECP-responsive regulatory element operatively 
linked to a nucleic acid encoding a detectable marker, e.g., luciferase), or detecting a cellular 
response, for example, cell survival, cellular differentiation, or cell proliferation. 

15 In yet another embodiment, an assay of the invention is a cell-free assay comprising 

contacting a SECP protein or biologically-active portion thereof with a test compound and 
determining the ability of the test compound to bind to the SECP protein or biologically-active 
portion thereof Binding of the test compound to the SECP protein can be determined either 
directly or indirectly as described above. In one such embodiment, the assay comprises 

20 contacting the SECP protein or biologically-active portion thereof with a known compound 

which binds SECP to form an assay mixture, contacting the assay mixture with a test compound, 
and determining the ability of the test compound to interact with a SECP protein, wherein 
determining the ability of the test compound to interact with a SECP protein comprises 
determining the ability of the test compound to preferentially bind to SECP or biologically-active 

25 portion thereof as compared to the known compound. 

In still another embodiment, an assay is a cell-free assay comprising contacting SECP 

protein or biologically-active portion thereof with a test compound and determining the ability of 

the test compound to modulate (e.g. stimulate or inhibit) the activity of the SECP protein or 

biologically-active portion thereof Determining the ability of the test compound to modulate the 

30 activity of SECP can be accomplished, for example, by determining the ability of the SECP 

protein to bind to a SECP target molecule by one of the methods described above for 

determining direct binding. In an alternative embodiment, determining the ability of the test 

compound to modulate the activity of SECP protein can be accomplished by determining the 

ability of the SECP protein further modulate a SECP target molecule. For example, the 
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catalytic/enzymatic activity of the target molecule on an appropriate substrate can be determined 
as described, supra. 

In yet another embodiment, the cell-free assay comprises contacting the SECP protein or 
biologically-active portion thereof with a known compound which binds SECP protein to form 
5 an assay mixture, contacting the assay mixture with a test compound, and determining the ability 
of the test compound to interact with a SECP protein, wherein determining the ability of the test 
compound to interact with a SECP protein comprises determining the ability of the SECP protein 
to preferentially bind to or modulate the activity of a SECP target molecule. 

The cell-free assays of the invention are amenable to use of both the soluble form or the 
10 membrane-bound form of SECP protein. In the case of cell-free assays comprising the 

membrane-bound form of SECP protein, it may be desirable to utilize a solubilizing agent such 
that the membrane-bound form of SECP protein is maintained in solution. Examples of such 
solubilizing agents include non-ionic detergents such as n-octylglucoside, n-dodecylglucoside, 
n-dodecylmaltoside, octanoyl-N-methylglucamide, decanoyl-N-methylglucamide, Triton® 
1 5 X-1 00, Triton® X-114, Thesit®, Isotridecypoly(ethylene glycol ether)n, N-dodecyl- 

N,N-dimethyl-3-ammonio-l -propane sulfonate, 3-(3-cholamidopropyl) dimethylamminiol- 
1 -propane sulfonate (CHAPS), or 3-(3-cholamidopropyl)dimethylamminiol-2-hydroxy. 
1 -propane sulfonate (CHAPSO). 

In more than one embodiment of the above assay methods of the invention, it may be 

20 desirable to immobilize either SECP protein or its target molecule to facilitate separation of 

complexed from uncomplexed forms of one or both of the proteins, as well as to accommodate 

automation of the assay. Binding of a test compound to SECP protein, or interaction of SECP 

protein with a target molecule in the presence and absence of a candidate compound, can be 

accomplished in any vessel suitable for containing the reactants. Examples of such vessels 

25 include microtiter plates, test tubes, and micro-centrifuge tubes. In one embodiment, a fusion 

protein can be provided that adds a domain that allows one or both of the proteins to be bound to 

a matrix. For example, GST-SECP fusion proteins or GST-target fusion proteins can be 

adsorbed onto glutathione sepharose beads (Sigma Chemical, St. Louis, MO) or glutathione 

derivatized microtiter plates, that are then combined with the test compound or the test 

30 compound and either the non-adsorbed target protein or SECP protein, and the mixture is 

incubated under conditions conducive to complex formation (e.^., at physiological conditions for 

salt and pH). Following incubation, the beads or microtiter plate wells are washed to remove any 

unbound components, the matrix immobilized in the case of beads, complex determined either 

directly or indirectly, for example, as described, supra. Alternatively, the complexes can be 
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dissociated from the matrix, and the level of SECP protein binding or activity determined using 
standard techniques. 

Other techniques for immobilizing proteins on matrices can also be used in the screening 
assays of the invention. For example, either the SECP protein or its target molecule can be 
5 immobilized utilizing conjugation of biotin and streptavidin. Biotinylated SECP protein or target 
molecules can be prepared from biotin-NHS (N-hydroxy-succinimide) using techniques well- 
known within the art (e.g., biotinylation kit. Pierce Chemicals, Rockford, 111.), and immobilized 
in the wells of streptavidin-coated 96 well plates (Pierce Chemical). Alternatively, antibodies 
reactive with SECP protein or target molecules, but which do not interfere with binding of the 

10 SECP protein to its target molecule, can be derivatized to the wells of the plate, and unbound 

target or SECP protein trapped in the wells by antibody conjugation. Methods for detecting such 
complexes, in addition to those described above for the GST-immobilized complexes, include 
immunodetection of complexes using antibodies reactive with the SECP protein or target 
molecule, as well as enzyme-linked assays that rely on detecting an enzymatic activity associated 

1 5 with the SECP protein or target molecule. 

In another embodiment, modulators of SECP protein expression are identified in a 
method wherein a cell is contacted with a candidate compound and the expression of SECP 
mRNA or protein in the cell is determined. The level of expression of SECP mRNA or protein 
in the presence of the candidate compound is compared to the level of expression of SECP 

20 mRNA or protein in the absence of the candidate compound. The candidate compound can then 
be identified as a modulator of SECP mRNA or protein expression based upon this comparison. 
For example, when expression of SECP mRNA or protein is greater (i.e., statistically 
significantly greater) in the presence of the candidate compound than in its absence, the 
candidate compound is identified as a stimulator of SECP mRNA or protein expression. 

25 Alternatively, when expression of SECP mRNA or protein is less (statistically significantly less) 
in the presence of the candidate compound than in its absence, the candidate compound is 
identified as an inhibitor of SECP mRNA or protein expression. The level of SECP mRNA or 
protein expression in the cells can be determined by methods described herein for detecting 
SECP mRNA or protein. 

30 In yet another aspect of the invention, the SECP proteins can be used as "bait proteins" in 

a two'hybrid assay or three hybrid assay (see, e.g., U.S. Patent No. 5,283,317; Zervos, et aL, 

1993. Cell 72: 223-232; Madura, et aL, 1993. J. Biol Chem. 268: 12046-12054; Bartel, et at., 

1993. Biotechniques 14: 920-924; Iwabuchi, et aL, 1993. Oncogene 8: 1693-1696; and Brent 

WO 94/10300), to identify other proteins that bind to or interact with SECP ("SECP-binding 
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proteins" or "SECP-bp") and modulate SECP activity. Such SECP-binding proteins are also 
likely to be involved in the propagation of signals by the SECP proteins as, for example, 
upstream or downstream elements of the SECP pathway. 

The two-hybrid system is based on the modular nature of most transcription factors, 
which consist of separable DNA-binding and activation domains. Briefly, the assay utilizes two 
different DNA constructs. In one construct, the gene that codes for SECP is fused to a gene 
encoding the DNA binding domain of a known transcription factor (e.g., GAL-4). In the other 
construct, a DNA sequence, from a library of DNA sequences, that encodes an unidentified 
protein ("prey" or "sample") is fused to a gene that codes for the activation domain of the known 
transcription factor. If the "bait" and the "prey" proteins are able to interact, in vivo, forming a 
SECP-dependent complex, the DNA-binding and activation domains of the transcription factor 
are brought into close proximity. This proximity allows transcription of a reporter gene (e.g., 
LacZ) that is operably linked to a transcriptional regulatory site responsive to the transcription 
factor. Expression of the reporter gene can be detected and cell colonies containing the 
functional transcription factor can be isolated and used to obtain the cloned gene that encodes the 
protein which interacts with SECP. 

The invention further pertains to novel agents identified by the aforementioned screening 
assays and uses thereof for treatments as described herein. 

Detection Assays 

Portions or fragments of the cDNA sequences identified herein (and the corresponding 
complete gene sequences) can be used in numerous ways as polynucleotide reagents. By way of 
example, and not of limitation, these sequences can be used to: (/) map their respective genes on 
a chromosome; and, thus, locate gene regions associated with genetic disease; (//) identify an 
individual from a minute biological sample (tissue typing); and (iii) aid in forensic identification 
of a biological sample. Some of these applications are described in the subsections below. 

Chromosome Mapping 

Once the sequence (or a portion of the sequence) of a gene has been isolated, this 
sequence can be used to map the location of the gene on a chromosome. This process is called 
chromosome mapping. Accordingly, portions or fragments of the SECP sequences shown in 
SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, or fragments or 
derivatives thereof, can be used to map the location of the SECP genes, respectively, on a 
chromosome. The mapping of the SECP sequences to chromosomes is an important first step in 
correlating these sequences with genes associated with disease. 
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Briefly, SECP genes can be mapped to chromosomes by preparing PCR primers 
(preferably 15-25 bp in length) from the SECP sequences. Computer analysis of the SECP, 
sequences can be used to rapidly select primers that do not span more than one exon in the 
genomic DNA, thus complicating the amplification process. These primers can then be used for 
PCR screening of somatic cell hybrids containing individual human chromosomes. Only those 
hybrids containing the human gene corresponding to the SECP sequences will yield an amplified 
fragment. 

> 

Somatic cell hybrids are prepared by fusing somatic cells from different mammals {e,g,, 
human and mouse cells). As hybrids of human and mouse cells grow and divide, they gradually 
lose human chromosomes in random order, but retain the mouse chromosomes. By using media 
in which mouse cells cannot grow, because they lack a particular enzyme, but in which human 
cells can, the one human chromosome that contains the gene encoding the needed enzyme will 
be retained. By using various media, panels of hybrid cell lines can be established. Each cell 
line in a panel contains either a single human chromosome or a small number of human 
1 5 chromosomes, and a full set of mouse chromosomes, allowing easy mapping of individual genes 
to specific human chromosomes. See, e.g., D'Eustachio, et ai, 1983. Science 220: 919-924. 
Somatic cell hybrids containing only fragments of human chromosomes can also be produced by 
using human chromosomes with translocations and deletions. 

PCR mapping of somatic cell hybrids is a rapid procedure for assigning a particular 
20 sequence to a particular chromosome. Three or more sequences can be assigned per day using a 
single thermal cycler. Using the SECP sequences to design oligonucleotide primers, sub- 
localization can be achieved with panels of fragments from specific chromosomes. 

Fluorescence in situ hybridization (FISH) of a DNA sequence to a metaphase 
chromosomal spread can further be used to provide a precise chromosomal location in one step. 

25 Chromosome spreads can be made using cells whose division has been blocked in metaphase by 
a chemical like colcemid that disrupts the mitotic spindle. The chromosomes can be treated 
briefly with trypsin, and then stained with Giemsa. A pattern of light and dark bands develops 
on each chromosome, so that the chromosomes can be identified individually. The FISH 
technique can be used with a DNA sequence as short as 500 or 600 bases. However, clones 

30 larger than 1 ,000 bases have a higher likelihood of binding to a unique chromosomal location 

with sufficient signal intensity for simple detection. Preferably 1,000 bases, and more preferably 
2,000 bases, will suffice to get good results at a reasonable amount of time. For a review of this 
technique, see, Verma, et al. Human Chromosomes: A Manual of Basic Techniques 
(Pergamon Press, New York 1988). 
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Reagents for chromosome mapping can be used individually to mark a single 
chromosome or a single site on that chromosome, or panels of reagents can be used for marking 
multiple sites and/or multiple chromosomes. Reagents corresponding to non-coding regions of 
the genes actually are preferred for mapping purposes. Coding sequences are more likely to be 
5 conserved within gene families, thus increasing the chance of cross hybridizations during 
chromosomal mapping. 

Once a sequence has been mapped to a precise chromosomal location, the physical 
position of the sequence on the chromosome can be correlated with genetic map data. Such data 
are found, e.g., in McKusick, Mendelian Inheritance in Man, available on-line through Johns 
10 Hopkins University Welch Medical Library). The relationship between genes and disease, 
mapped to the same chromosomal region, can then be identified through linkage analysis 
(co-inheritance of physically adjacent genes), described in, e.g., Egeland, et aL, 1987. Nature, 
325: 783-787. 

Additionally, differences in the DNA sequences between individuals affected and 
15 unaffected with a disease associated with the SECP gene, can be determined. If a mutation is 

observed in some or all of the affected individuals but not in any unaffected individuals, then the 
mutation is likely to be the causative agent of the particular disease. Comparison of affected and 
unaffected individuals generally involves first looking for structural alterations in the 
chromosomes, such as deletions or translocations that are visible from chromosome spreads or 
20 detectable using PCR based on that DNA sequence. Ultimately, complete sequencing of genes 
from several individuals can be performed to confirm the presence of a mutation and to 
distinguish mutations from polymorphisms. 

Tissue Typing 

The SECP sequences of the invention can also be used to identify individuals from 
25 minute biological samples. In this technique, an individual's genomic DNA is digested with one 
or more restriction enzymes, and probed on a Southern blot to yield unique bands for 
identification. The sequences of the invention are useful as additional DNA markers for RFLP 
("restriction fragment length polymorphisms," as described in U.S. Patent No. 5,272,057). 

Furthermore, the sequences of the invention can be used to provide an alternative 
30 technique that determines the actual base-by-base DNA sequence of selected portions of an 
individual's genome. Thus, the SECP sequences described herein can be used to prepare two 
PCR primers from the 5'- and 3'-termini of the sequences. These primers can then be used to 
amplify an individual's DNA and subsequently sequence it. 
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Panels of corresponding DNA sequences from individuals, prepared in this manner, can 
provide unique individual identifications, as each individual will have a unique set of such DNA 
sequences due to allelic differences. The sequences of the invention can be used to obtain such 
identification sequences from individuals and from tissue. The SECP sequences of the invention 
5 uniquely represent portions of the human genome. Allelic variation occurs to some degree in the 
coding regions of these sequences, and to a greater degree in the non-coding regions. It is 
estimated that allelic variation between individual humans occurs with a frequency of about once 
per each 500 bases. Much of the allelic variation is due to single nucleotide polymorphisms 
(SNPs), which include restriction fragment length polymorphisms (RFLPs). 

10 Each of the sequences described herein can, to some degree, be used as a standard against 

which DNA from an individual can be compared for identification purposes. Because greater 
numbers of polymorphisms occur in the non-coding regions, fewer sequences are necessary to 
differentiate individuals. The non-coding sequences can comfortably provide positive individual 
identification with a panel of perhaps 10 to 1,000 primers that each yield a non-coding amplified 

15 sequence of 100 bases. If predicted coding sequences, such as those in SEQ ID NO: 1 , 3, 5, 7, 9, 
11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56 are used, a more appropriate number of 
primers for positive individual identification would be 500-2,000. 

Predictive Medicine 

The invention also pertains to the field of predictive medicine in which diagnostic assays, 
20 prognostic assays, pharmacogenomics, and monitoring clinical trials are used for prognostic 

(predictive) purposes to thereby treat an individual prophylactically. Accordingly, one aspect of 
the invention relates to diagnostic assays for determining SECP protein and/or nucleic acid 
expression as well as SECP activity, in the context of a biological sample (e.g., blood, serum, 
cells, tissue) to thereby determine whether an individual is afflicted with a disease or disorder, or 
25 is at risk of developing a disorder, associated with aberrant SECP expression or activity. The 
invention also provides for prognostic (or predictive) assays for determining whether an 
individual is at risk of developing a disorder associated with SECP protein, nucleic acid 
expression or activity. For example, mutations in a SECP gene can be assayed in a biological 
sample. Such assays can be used for prognostic or predictive purpose to thereby prophylactically 
30 treat an individual prior to the onset of a disorder characterized by or associated with SECP 
protein, nucleic acid expression or activity. 

Another aspect of the invention provides methods for determining SECP protein, nucleic 
acid expression or SECP activity in an individual to thereby select appropriate therapeutic or 
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prophylactic agents for that individual (referred to herein as "pharmacogenomics"). 
Pharmacogenomics allows for the selection of agents (e.g., drugs) for therapeutic or prophylactic 
treatment of an individual based on the genotype of the individual (e.g., the genotype of the 
individual examined to determine the ability of the individual to respond to a particular agent.) 

5 Yet another aspect of the invention pertains to monitoring the influence of agents (e.g., 

drugs, compounds) on the expression or activity of SECP in clinical trials. 

Use of Partial SECP Sequences in Forensic Biology 
DNA-based identification techniques can also be used in forensic biology. Forensic 
biology is a scientific field employing genetic typing of biological evidence found at a crime 
10 scene as a means for positively identifying, e.g., a perpetrator of a crime. To make such an 
identification, PGR technology can be used to amplify DNA sequences taken from very small 
biological samples such as tissues (e.g., hair or skin, or body fluids, e.g., blood, saliva, or semen 
found at a crime scene). The amplified sequence can then be compared to a standard, thereby 
allowing identification of the origin of the biological sample. 

1 5 The sequences of the invention can be used to provide polynucleotide reagents, e.g., PGR 

primers, targeted to specific loci in the human genome, that can enhance the reliability of 
DNA-based forensic identifications by, for example, providing another "identification marker" 
(i.e. another DNA sequence that is unique to a particular individual). As mentioned above, 
actual base sequence information can be used for identification as an accurate alternative to 

20 patterns formed by restriction enzyme generated fragments. Sequences targeted to non-coding 
regions of SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56 are 
particularly appropriate for this use as greater numbers of polymorphisms occur in the non- 
coding regions, making it easier to differentiate individuals using this technique. Examples of 
polynucleotide reagents include the SEGP sequences or portions thereof, e.g., fragments derived 

25 firom the non-coding regions of one or more of SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 
44, 46, 48, 50, 52, 54 and 56 having a length of at least 20 bases, preferably at least 30 bases. 

The SEGP sequences described herein can further be used to provide polynucleotide 
reagents, e.g., labeled or label-able probes that can be used, for example, in an in situ 
hybridization technique, to identify a specific tissue (e.g., brain tissue, etc). This can be very 
30 useful in cases where a forensic pathologist is presented with a tissue of unknown origin. Panels 
of such SECP probes can be used to identify tissue by species and/or by organ type. 
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In a similar fashion, these reagents, e.g., SECP primers or probes can be used to screen 
tissue culture for contamination (/.e., screen for the presence of a mixture of different types of 
cells in a culture). 

Predictive Medicine 

5 The invention also pertains to the field of predictive medicine in which diagnostic assays, 

prognostic assays, pharmacogenomics, and monitoring clinical trials are used for prognostic 
(predictive) purposes to thereby treat an individual prophylactically. Accordingly, one aspect of 
the invention relates to diagnostic assays for determining SECP protein and/or nucleic acid 
expression as well as SECP activity, in the context of a biological sample (e.g., blood, serum, 

10 cells, tissue) to thereby determine whether an individual is afflicted with a disease or disorder, or 
is at risk of developing a disorder, associated with aberrant SECP expression or activity. The 
invention also provides for prognostic (or predictive) assays for determining whether an 
individual is at risk of developing a disorder associated with SECP protein, nucleic acid 
expression or activity. For example, mutations in a SECP gene can be assayed in a biological 

1 5 sample. Such assays can be used for prognostic or predictive purpose to thereby prophylactically 
treat an individual prior to the onset of a disorder characterized by or associated with SECP 
protein, nucleic acid expression, or biological activity. 

Another aspect of the invention provides methods for determining SECP protein, nucleic 
acid expression or activity in an individual to thereby select appropriate therapeutic or 
20 prophylactic agents for that individual (referred to herein as "pharmacogenomics"). 

Pharmacogenomics allows for the selection of agents (e.g., drugs) for therapeutic or prophylactic 
treatment of an individual based on the genotype of the individual (e.g., the genotype of the 
individual examined to determine the ability of the individual to respond to a particular agent.) 

Yet another aspect of the invention pertains to monitoring the influence of agents (e.g., 
25 drugs, compounds) on the expression or activity of SECP in clinical trials. 

These and various other agents are described in further detail in the following sections. 
Diagnostic Assays 

An exemplary method for detecting the presence or absence of SECP in a biological 
sample involves obtaining a biological sample from a test subject and contacting the biological 
30 sample with a compound or an agent capable of detecting SECP protein or nucleic acid (e.g., 

mRNA, genomic DNA) that encodes SECP protein such that the presence of SECP is detected in 
the biological sample. An agent for detecting SECP mRNA or genomic DNA is a labeled 
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nucleic acid probe capable of hybridizing to SECP mRNA or genomic DNA. The nucleic acid 
probe can be, for example, a full-length SECP nucleic acid, such as the nucleic acid of SEQ ID 
NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56 or a portion thereof, such as 
an oligonucleotide of at least 15, 30, 50, 100, 250 or 500 nucleotides in length and sufficient to 
5 specifically hybridize under stringent conditions to SECP mRNA or genomic DNA. Other 
suitable probes for use in the diagnostic assays of the invention are described herein. 

An agent for detecting SECP protein is an antibody capable of binding to SECP protein, 
preferably an antibody with a detectable label. Antibodies can be polyclonal, or more preferably, 
monoclonal. An intact antibody, or a fragment thereof (e.g.. Fab or F(ab)2) can be used. The term 

10 "labeled", with regard to the probe or antibody, is intended to encompass direct labeling of the 
probe or antibody by coupling {i.e., physically linking) a detectable substance to the probe or 
antibody, as well as indirect labeling of the probe or antibody by reactivity with another reagent 
that is directly labeled. Examples of indirect labeling include detection of a primary antibody 
using a fluorescently-labeled secondary antibody and end-labeling of a DNA probe with biotin 

15 such that it can be detected with fluorescently-labeled streptavidin. The term "biological 

sample" is intended to include tissues, cells and biological fluids isolated from a subject, as well 
as tissues, cells and fluids present within a subject. That is, the detection method of the invention 
can be used to detect SECP mRNA, protein, or genomic DNA in a biological sample in vitro as 
well as in vivo. For example, in vitro techniques for detection of SECP mRNA include Northern 

20 hybridizations and in situ hybridizations. In vitro techniques for detection of SECP protein 

include enzyme linked immunosorbent assays (ELISAs), Western blots, immunoprecipitations, 
and immunofluorescence. In vitro techniques for detection of SECP genomic DNA include 
Southern hybridizations. Furthermore, in vivo techniques for detection of SECP protein include 
introducing into a subject a labeled anti-SECP antibody. For example, the antibody can be 

25 labeled with a radioactive marker whose presence and location in a subject can be detected by 
standard imaging techniques. 

In one embodiment, the biological sample contains protein molecules from the test 
subject. Alternatively, the biological sample can contain mRNA molecules from the test subject 
or genomic DNA molecules from the test subject. A preferred biological sample is a peripheral 
30 blood leukocyte sample isolated by conventional means from a subject. 

In another embodiment, the methods further involve obtaining a control biological 

sample from a control subject, contacting the control sample with a compound or agent capable 

of detecting SECP protein, mRNA, or genomic DNA, such that the presence of SECP protein, 

mRNA or genomic DNA is detected in the biological sample, and comparing the presence of 
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SECP protein, mRNA or genomic DNA in the control sample with the presence of SECP 
protein, mRNA or genomic DNA in the test sample. 

The invention also encompasses kits for detecting the presence of SECP in a biological 
sample. For example, the kit can comprise: a labeled compound or agent capable of detecting 
5 SECP protein or mRNA in a biological sample; means for determining the amount of SECP in 
the sample; and means for comparing the amount of SECP in the sample with a standard. The 
compound or agent can be packaged in a suitable container. The kit can further comprise 
instructions for using the kit to detect SECP protein or nucleic acid. 

Prognostic Assays 

10 The diagnostic methods described herein can furthermore be utilized to identify subjects 

having or at risk of developing a disease or disorder associated with aberrant SECP expression or 
activity. For example, the assays described herein, such as the preceding diagnostic assays or the 
following assays, can be utilized to identify a subject having or at risk of developing a disorder 
associated with SECP protein, nucleic acid expression or activity. Alternatively, the prognostic 

15 assays can be utilized to identify a subject having or at risk for developing a disease or disorder. 
Thus, the invention provides a method for identifying a disease or disorder associated with 
aberrant SECP expression or activity in which a test sample is obtained from a subject and SECP 
protein or nucleic acid (e^., mRNA, genomic DNA) is detected, wherein the presence of SECP 
protein or nucleic acid is diagnostic for a subject having or at risk of developing a disease or 

20 disorder associated with aberrant SECP expression or activity. As used herein, a "test sample" 
refers to a biological sample obtained from a subject of interest. For example, a test sample can 
be a biological fluid (e.^., serum), cell sample, or tissue. 

Furthermore, the prognostic assays described herein can be used to determine whether a. 
subject can be administered an agent (e.g,^ an agonist, antagonist, peptidomimetic, protein, 

25 peptide, nucleic acid, small molecule, or other drug candidate) to treat a disease or disorder 

associated with aberrant SECP expression or activity. For example, such methods can be used to 
determine whether a subject can be effectively treated with an agent for a disorder. Thus, the 
invention provides methods for determining whether a subject can be effectively treated with an 
agent for a disorder associated with aberrant SECP expression or activity in which a test sample 

30 is obtained and SECP protein or nucleic acid is detected (e.g., wherein the presence of SECP 
protein or nucleic acid is diagnostic for a subject that can be administered the agent to treat a 
disorder associated with aberrant SECP expression or activity). 
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The methods of the invention can also be used to detect genetic lesions in a SECP gene, 
thereby determining if a subject with the lesioned gene is at risk for a disorder characterized by 
aberrant cell proliferation and/or differentiation. In various embodiments, the methods include 
detecting, in a sample of cells from the subject, the presence or absence of a genetic lesion 
5 characterized by at least one of an alteration affecting the integrity of a gene encoding a 

SECP-protein, or the mis-expression of the SECP gene. For example, such genetic lesions can 
be detected by ascertaining the existence of at least one of; (/) a deletion of one or more 
nucleotides from a SECP gene; (//) an addition of one or more nucleotides to a SECP gene; 
(m) a substitution of one or more nucleotides of a SECP gene, (iv) a chromosomal rearrangement 

10 of a SECP gene; (v) an alteration in the level of a messenger RNA transcript of a SECP gene, 
(v/) aberrant modification of a SECP gene, such as of the methylation pattern of the genomic 
DNA, (vii) the presence of a non-wild-type splicing pattern of a messenger RNA transcript of a 
SECP gene, (viii) a non-wild-type level of a SECP protein, (j;c) allelic loss of a SECP gene, and 
{x) inappropriate post-translational modification of a SECP protein. As described herein, there 

15 are a large number of assay techniques known in the art which can be used for detecting lesions 
in a SECP gene. A preferred biological sample is a peripheral blood leukocyte sample isolated 
by conventional means from a subject. However, any biological sample containing nucleated 
cells may be used, including, for example, buccal mucosal cells. 

In certain embodiments, detection of the lesion involves the use of a probe/primer in a 
20 polymerase chain reaction (PCR) (see, e.g., U.S. Patent Nos. 4,683, 1 95 and 4,683,202), such as 
anchor PCR or RACE PCR, or, alternatively, in a ligation chain reaction (LCR) (see, e.g., 
Landegran, et aL, 1988. Science 241 : 1077-1080; and Nakazawa, et aL, 1994. Proc. Natl. Acad. 
Sci. USA 91 : 360-364), the latter of which can be particularly useful for detecting point 
mutations in the SECP-gene (see, Abravaya, et al., 1995. NucL Acids Res. 23: 675-682). This 
25 method can include the steps of collecting a sample of cells from a patient, isolating nucleic acid 
(e.g., genomic, mRNA or both) from the cells of the sample, contacting the nucleic acid sample 
with one or more primers that specifically hybridize to a SECP gene under conditions such that 
hybridization and amplification of the SECP gene (if present) occurs, and detecting the presence 
or absence of an amplification product, or detecting the size of the amplification product and 
30 comparing the length to a control sample. It is anticipated that PCR and/or LCR may be 

desirable to use as a preliminary amplification step in conjunction with any of the techniques 
used for detecting mutations described herein. 

Alternative amplification methods include: self sustained sequence replication (see, 
Guatelli, et aL, 1990. Proc. Natl. Acad. Sci. USA 87: 1874-1878), transcriptional amplification 
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system {see, Kwoh, et aL, 1989. Proc. Natl. Acad. ScL USA 86: 1 173-1 177); Qp Replicase (see, 
Lizardi, et al, 1988. BioTechnology 6: 1 197), or any other nucleic acid amplification method, 
followed by the detection of the amplified molecules using techniques well known to those of 
skill in the art. These detection schemes are especially useful for the detection of nucleic acid 
5 molecules if such molecules are present in very low numbers. 

In an alternative embodiment, mutations in a SECP gene from a sample cell can be 
identified by alterations in restriction enzyme cleavage patterns. For example, sample and 
control DNA is isolated, amplified (optionally), digested with one or more restriction 
endonucleases, and fragment length sizes are determined by gel electrophoresis and compared, 
10 Differences in fragment length sizes between sample and control DNA indicates mutations in the 
sample DNA. Moreover, the use of sequence specific ribozjmies (see, e.g., U.S. Patent No. 
5,493,53 1) can be used to score for the presence of specific mutations by development or loss of 
a riboz3rae cleavage site. 

In other embodiments, genetic mutations in SECP can be identified by hybridizing a 
15 sample and control nucleic acids, e.g., DNA or RNA, to high-density arrays containing hundreds 
or thousands of oligonucleotides probes. See, e.g., Cronin, et aL, 1996. Human Mutation 7: 
244-255; Kozal, et aL, 1996. Nat. Med. 2: 753-759. For example, genetic mutations in SECP can 
be identified in two dimensional arrays containing light-generated DNA probes as described in 
Cronin, et aL, supra. Briefly, a first hybridization array of probes can be used to scan through 
20 long stretches of DNA in a sample and control to identify base changes between the sequences 
by making linear arrays of sequential overlapping probes. This step allows the identification of 
point mutations. This is followed by a second hybridizafion array that allows the 
characterization of specific mutations by using smaller, specialized probe arrays complementary 
to all variants or mutations detected. Each mutation array is composed of parallel probe sets, one 
25 complementary to the wild-type gene and the other complementary to the mutant gene. 

In yet another embodiment, any of a variety of sequencing reactions known in the art can 
be used to directly sequence the SECP gene and detect mutations by comparing the sequence of 
the sample SECP with the corresponding wild-type (control) sequence. Examples of sequencing 
reactions include those based on techniques developed by Maxim and Gilbert, 1977. Proc, NatL 
30 Acad, Sci. USA 74: 560 or Sanger, 1977. Proc. NatL Acad. ScL USA 74: 5463. It is also 

contemplated that any of a variety of automated sequencing procedures can be utilized when 
performing the diagnostic assays (see, e.g., Naeve, et aL, 1995. Biotechniques 19: 448), 
including sequencing by mass spectrometry (see, e.g., PCT International Publication No. WO 
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94/16101; Cohen, et al,, 1996. Adv. Chromatography 36: 127-162; and Griffin, et aL, 1993. 
AppL Biochem. Biotechnol 38: 147-159). 

Other methods for detecting mutations in the SECP gene include methods in which 
protection from cleavage agents is used to detect mismatched bases in RNA/RNA or RNA/DNA 
5 heteroduplexes. See, e,g., Myers, et al.y 1985. Science 230: 1242. In general, the art technique 
of "mismatch cleavage" starts by providing heteroduplexes of formed by hybridizing (labeled) 
RNA or DNA containing the wild-type SECP sequence with potentially mutant RNA or DNA 
obtained from a tissue sample. The double-stranded duplexes are treated with an agent that 
cleaves single-stranded regions of the duplex such as which will exist due to basepair 

10 mismatches between the control and sample strands. For instance, RNA/DNA duplexes can be 
treated with RNase and DNA/DNA hybrids treated with Si nuclease to enzymatically digesting 
the mismatched regions. In other embodiments, either DNA/DNA or RNA/DNA duplexes can 
be treated with hydroxylamine or osmium tetroxide and with piperidine in order to digest 
mismatched regions. After digestion of the mismatched regions, the resulting material is then 

15 separated by size on denaturing polyacrylamide gels to determine the site of mutation. See, e.g.. 
Cotton, et aL, 1988. Proc. Natl. Acad. Sci. USA 85: 4397; Saleeba, et al., 1992. Methods 
Enzymol. 217: 286-295. In an embodiment, the control DNA or RNA can be labeled for 
detection. 

In still another embodiment, the mismatch cleavage reaction employs one or more 
20 proteins that recognize mismatched base pairs in double-stranded DNA (so called "DNA 

mismatch repair" enzymes) in defined systems for detecting and mapping point mutations in 
SECP cDNAs obtained from samples of cells. For example, the mutY enzyme of E. coli cleaves 
A at G/A mismatches and the thymidine DNA glycosylase from HeLa cells cleaves T at G/T 
mismatches. See, e.g., Hsu, et aL^ 1994. Carcinogenesis 15: 1657-1662. According to an 
25 exemplary embodiment, a probe based on a SECP sequence, e.g., a wild-type SECP sequence, is 
hybridized to a cDNA or other DNA product from a test cell(s). The duplex is treated with a 
DNA mismatch repair enzyme, and the cleavage products, if any, can be detected from 
electrophoresis protocols or the like. See, e.g., U.S. Patent No. 5,459,039. 

In other embodiments, alterations in electrophoretic mobility will be used to identify 

30 mutations in SECP genes. For example, single strand conformation polymorphism (SSCP) may 

be used to detect differences in electrophoretic mobility between mutant and wild type nucleic 

acids. See, e.g., Orita, etal., 1989. Proc. Natl. Acad. ScL USA: 86: 2766; Cotton, 1993. Mutat. 

Res. 285: 125-144; Hayashi, 1992. Genet. Anal. Tech. Appl. 9: 73-79. Single-stranded DNA 

fragments of sample and control SECP nucleic acids will be denatured and allowed to renature. 
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The secondary structure of single-stranded nucleic acids varies according to sequence, the 
resulting alteration in electrophoretic mobility enables the detection of even a single base change. 
The DNA fragments may be labeled or detected with labeled probes. The sensitivity of the assay 
may be enhanced by using RNA (rather than DNA), in which the secondary structure is more 
5 sensitive to a change in sequence. In one embodiment, the subject method utilizes heteroduplex 
analysis to separate double stranded heteroduplex molecules on the basis of changes in 
electrophoretic mobility. See, e.g., Keen, et aLy 1991. Trends Genet. 7: 5. 

In yet another embodiment, the movement of mutant or wild-type fragments in 
polyacrylamide gels containing a gradient of denaturant is assayed using denaturing gradient gel 

10 electrophoresis (DGGE). See, e.g., Myers, et at., 1985. Nature 313: 495. When DGGE is used 
as the method of analysis, DNA will be modified to insure that it does not completely denature, 
for example by adding a GC clamp of approximately 40 bp of high-melting GC-rich DNA by 
PGR. In a further embodiment, a temperature gradient is used in place of a denaturing gradient 
to identify differences in the mobility of control and sample DNA. See, e.g., Rosenbaum and 

15 Reissner, \9%1 . Biophys, Chem. 265: 12753. 

Examples of other techniques for detecting point mutations include, but are not limited 
to, selective oligonucleotide hybridization, selective amplification, or selective primer extension. 
For example, oligonucleotide primers may be prepared in which the known mutation is placed 
centrally and then hybridized to target DNA under conditions that permit hybridization only if a 
20 perfect match is found. See, e.g., Saiki, et al., 1986. Nature 324: 163; Saiki, et al., 1989. Proc. 
Natl. Acad. Sci. USA 86: 6230. Such allele specific oligonucleotides are hybridized to PGR 
amplified target DNA or a number of different mutations when the oligonucleotides are attached 
to the hybridizing membrane and hybridized with labeled target DNA. 

Alternatively, allele specific amplification technology that depends on selective PGR 
25 amplification may be used in conjunction with the instant invention. Oligonucleotides used as 

primers for specific amplification may carry the mutation of interest in the center of the molecule 
(so that amplification depends on differential hybridization; see, e.g., Gibbs, et al., 1989. NucL 
Acids Res. 17: 2437-2448) or at the extreme 3 -terminus of one primer where, under appropriate 
conditions, mismatch can prevent, or reduce polymerase extension (see, e.g., Prossner, 1993. 
30 Tibtech. 1 1 : 238). In addition it may be desirable to introduce a novel restriction site in the 

region of the mutation to create cleavage-based detection. See, e.g., Gasparini, et aL, 1992. Mol. 
Cell Probes 6:1. It is anticipated that in certain embodiments amplification may also be 
performed using Tag ligase for amplification. See, e.g., Barany, 1991. Proc. Natl. Acad. Sci. 
USA 88: 189. In such cases, ligation will occur only if there is a perfect match at the 3 -terminus 



of the 5' sequence, making it possible to detect the presence of a known mutation at a specific 
site by looking for the presence or absence of amplification. 

The methods described herein may be performed, for example, by utilizing pre-packaged 
diagnostic kits comprising at least one probe nucleic acid or antibody reagent described herein, 
5 which may be conveniently used, e.g.^ in clinical settings to diagnose patients exhibiting 
symptoms or family history of a disease or illness involving a SECP gene. 

Furthermore, any cell type or tissue, preferably peripheral blood leukocytes, in which 
SECP is expressed may be utilized in the prognostic assays described herein. However, any 
biological sample containing nucleated cells may be used, including, for example, buccal 
10 mucosal cells. 

Pharmacogenomics 

Agents, or modulators that have a stimulatory or inhibitory effect on SECP activity {e.g,y 
SECP gene expression), as identified by a screening assay described herein can be administered 
to individuals to treat (prophylactically or therapeutically) disorders (e.g., cancer or immune 

1 5 disorders associated with aberrant SECP activity. In conjunction with such treatment, the 

pharmacogenomics (i.e., the study of the relationship between an individuaPs genotype and that 
individual's response to a foreign compound or drug) of the individual may be considered. 
Differences in metabolism of therapeutics can lead to severe toxicity or therapeutic failure by 
altering the relation between dose and blood concentration of the pharmacologically active drug. 

20 Thus, the pharmacogenomics of the individual permits the selection of effective agents (e.g., 
drugs) for prophylactic or therapeutic treatments based on a consideration of the individual's 
genotype. Such pharmacogenomics can further be used to determine appropriate dosages and 
therapeutic regimens. Accordingly, the activity of SECP protein, expression of SECP nucleic 
acid, or mutation content of SECP genes in an individual can be determined to thereby select 

25 appropriate ag6nt(s) for therapeutic or prophylactic treatment of the individual. 

Pharmacogenomics deals with clinically significant hereditary variations in the response 
to drugs due to altered drug disposition and abnormal action in affected persons. See e.g., 
Eichelbaum, 1996. Clin. Exp. Pharmacol. Physiol. 23: 983-985; Linder, 1997. Clin. Chenu, 43: 
254-266. In general, two types of pharmacogenetic conditions can be differentiated. Genetic 
30 conditions transmitted as a single factor altering the way drugs act on the body (altered drug 

action) or genetic conditions transmitted as single factors altering the way the body acts on drugs 
(altered drug metabolism). These pharmacogenetic conditions can occur either as rare defects or 
as polymorphisms. For example, glucose-6-phosphate dehydrogenase (G6PD) deficiency is a 
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common inherited enzymopathy in which the main clinical complication is hemolysis after 
ingestion of oxidant drugs (anti-malarials, sulfonamides, analgesics, nitrofurans) and 
consumption of fava beans. 

As an illustrative embodiment, the activity of drug metabolizing enzymes is a major 
5 determinant of both the intensity and duration of drug action. The discovery of genetic 
polymorphisms of drug metabolizing enzymes {e.g., N-acetyltransferase 2 (NAT 2) and 
cytochrome P450 enzymes CYP2D6 and CYP2C19) has provided an explanation as to why 
some patients do not obtain the expected drug effects or show exaggerated drug response and 
serious toxicity after taking the standard and safe dose of a drug. These polymorphisms are 

10 expressed in two phenotypes in the population, the extensive metabolizer (EM) and poor 

metabolizer (PM). The prevalence of PM is different among different populations. For example, 
the gene coding for CYP2D6 is highly polymorphic and several mutations have been identified 
in PM, which all lead to the absence of functional CYP2D6. Poor metabolizers of CYP2D6 and 
CYP2C19 quite frequently experience exaggerated drug response and side effects when they 

15 receive standard doses. If a metabolite is the active therapeutic moiety, PM show no therapeutic 
response, as demonstrated for the analgesic effect of codeine mediated by its CYP2D6-formed 
metabolite morphine. At the other extreme are the so called ultra-rapid metabolizers who do not 
respond to standard doses. Recently, the molecular basis of ultra-rapid metabolism has been 
identified to be due to CYP2D6 gene amplification. 

20 Thus, the activity of SECP protein, expression of SECP nucleic acid, or mutation content 

of SECP genes in an individual can be determined to thereby select appropriate agent(s) for 
therapeutic or prophylactic treatment of the individual. In addition, pharmacogenetic studies can 
be used to apply genotyping of polymorphic alleles encoding drug-metabolizing enzymes to the 
identification of an individual's drug responsiveness phenotype. This knowledge, when applied 

25 to dosing or drug selection, can avoid adverse reactions or therapeutic failure and thus enhance 
therapeutic or prophylactic efficiency when treating a subject with a SECP modulator, such as a 
modulator identified by one of the exemplary screening assays described herein. 

Monitoring of Effects During Clinical Trials 

Monitoring the influence of agents (e.g., drugs, compounds) on the expression or activity 
30 of SECP {e.g., the ability to modulate aberrant cell proliferation and/or differentiation) can be 
applied not only in basic drug screening, but also in clinical trials. For example, the 
effectiveness of an agent determined by a screening assay as described herein to increase SECP 
gene expression, protein levels, or upregulate SECP activity, can be monitored in clinical trails 
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of subjects exhibiting decreased SECP gene expression, protein levels, or down-regulated SECP 
activity. Alternatively, the effectiveness of an agent determined by a screening assay to decrease 
SECP gene expression, protein levels, or down-regulate SECP activity, can be monitored in 
clinical trails of subjects exhibiting increased SECP gene expression, protein levels, or up- 
5 regulated SECP activity. In such clinical trials, the expression or activity of SECP and, 
preferably, other genes that have been implicated in, for example, a cellular proliferation or 
immune disorder can be used as a "read out" or markers of the immune responsiveness of a 
particular cell. 

By way of example, and not of limitation, genes, including SECP, that are modulated in 
10 cells by treatment with an agent (e.g., compound, drug or small molecule) that modulates SECP 
activity (e.g., identified in a screening assay as described herein) can be identified. Thus, to 
study the effect of agents on cellular proliferation disorders, for example, in a clinical trial, cells 
can be isolated and RNA prepared and analyzed for the levels of expression of SECP and other 
genes implicated in the disorder. The levels of gene expression (i.e., a gene expression pattern) 
15 can be quantified by Northern blot analysis or RT-PCR, as described herein, or alternatively by 
measuring the amount of protein produced, by one of the methods as described herein, or by 
measuring the levels of activity of SECP or other genes. In this manner, the gene expression 
pattern can serve as a marker, indicative of the physiological response of the cells to the agent. 
Accordingly, this response state may be determined before, and at various points during, 
20 treatment of the individual with the agent. 

In one embodiment, the invention provides a method for monitoring the effectiveness of 
treatment of a subject with an agent (e.g., an agonist, antagonist, protein, peptide, 
peptidomimetic, nucleic acid, small molecule, or other drug candidate identified by the screening 
assays described herein) comprising the steps of (/) obtaining a pre-administration sample from a 

25 subject prior to administration of the agent; (//) detecting the level of expression of a SECP 

protein, mRNA, or genomic DNA in the pre-administration sample; (///) obtaining one or more 
post-administration samples from the subject; (iv) detecting the level of expression or activity of 
the SECP protein, mRNA, or genomic DNA in the post-administration samples; (v) comparing 
the level of expression or activity of the SECP protein, mRNA, or genomic DNA in the 

30 pre-administration sample with the SECP protein, mRNA, or genomic DNA in the post 

administration sample or samples; and (v/) altering the administration of the agent to the subject 
accordingly. For example, increased administration of the agent may be desirable to increase the 
expression or activity of SECP to higher levels than detected, i.e., to increase the effectiveness of 
the agent. Alternatively, decreased administration of the agent may be desirable to decrease 
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expression or activity of SECP to lower levels than detected, i.e., to decrease the effectiveness of 
the agent. 

Methods of Treatment 

The invention provides for both prophylactic and therapeutic methods of treating a 
5 subject at risk of (or susceptible to) a disorder or having a disorder associated with aberrant 
SECP expression or activity. These methods of treatment will be discussed more fully, below. 

Disease ami Disorders 

Diseases and disorders that are characterized by increased (relative to a subject not 
suffering from the disease or disorder) levels or biological activity may be treated with 

10 Therapeutics that antagonize {i.e., reduce or inhibit) activity. Therapeutics that antagonize 

activity may be administered in a therapeutic or prophylactic manner. Therapeutics that may be 
utilized include, but are not limited to: (/) an aforementioned peptide, or analogs, derivatives, 
fragments or homologs thereof; (ii) antibodies to an aforementioned peptide; {Hi) nucleic acids 
encoding an aforementioned peptide; (/v) administration of antisense nucleic acid and nucleic 

15 acids that are "dysfunctional" {i.e., due to a heterologous insertion within the coding sequences 
of coding sequences to an aforementioned peptide) that are utilized to "knockout" endoggenous 
function of an aforementioned peptide by homologous recombination {see, e.g., Capecchi, 1989. 
Science 244: 1288-1292); or (v) modulators ( i.e., inhibitors, agonists and antagonists, including 
additional peptide mimetic of the invention or antibodies specific to a peptide of the invention) 

20 that alter the interaction between an aforementioned peptide and its binding partner. 

Diseases and disorders that are characterized by decreased (relative to a subject not 
suffering from the disease or disorder) levels or biological activity may be treated with 
Therapeutics that increase {i.e., are agonists to) activity. Therapeutics that upregulate activity 
may be administered in a therapeutic or prophylactic manner. Therapeutics that may be utilized 
25 include, but are not limited to, an aforementioned peptide, or analogs, derivatives, fragments or 
homologs thereof; or an agonist that increases bioavailability. 

Increased or decreased levels can be readily detected by quantifying peptide and/or RNA, 
by obtaining a patient tissue sample {e.g., from biopsy tissue) and assaying it in vitro for RNA or 
peptide levels, structure and/or activity of the expressed peptides (or mRNAs of an 
30 aforementioned peptide). Methods that are well-known within the art include, but are not limited 
to, immunoassays {e.g., by Western blot analysis, immunoprecipitation followed by sodium 
dodecyl sulfate (SDS) polyacrylamide gel electrophoresis, immunocytochemistry, etc.) and/or 
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hybridization assays to detect expression of mRNAs {e.g., Northern assays, dot blots, in situ 
hybridization, and the Uke). 

Prophylactic Methods 

In one aspect, the invention provides a method for preventing, in a subject, a disease or 
5 condition associated with an aberrant SECP expression or activity, by administering to the 

subject an agent that modulates SECP expression or at least one SECP activity. Subjects at risk 
for a disease that is caused or contributed to by aberrant SECP expression or activity can be 
identified by, for example, any or a combination of diagnostic or prognostic assays as described 
herein. Administration of a prophylactic agent can occur prior to the manifestation of symptoms 
10 characteristic of the SECP aberrancy, such that a disease or disorder is prevented or, 

alternatively, delayed in its progression. Depending upon the type of SECP aberrancy, for 
example, a SECP agonist or SECP antagonist agent can be used for treating the subject. The 
appropriate agent can be determined based on screening assays described herein. 

Therapeutic Methods 

1 5 Another aspect of the invention pertains to methods of modulating SECP expression or 

activity for therapeutic purposes. The modulatory method of the invention involves contacting a 
cell with an agent that modulates one or more of the activities of SECP protein activity 
associated with the cell. An agent that modulates SECP protein activity can be an agent as 
described herein, such as a nucleic acid or a protein, a naturally-occurring cognate ligand of a 

20 SECP protein, a peptide, a SECP peptidomimetic, or other small molecule. In one embodiment, 
the agent stimulates one or more SECP protein activity. Examples of such stimulatory agents 
include active SECP protein and a nucleic acid molecule encoding SECP that has been 
introduced into the cell. In another embodiment, the agent inhibits one or more SECP protein 
activity. Examples of such inhibitory agents include antisense SECP nucleic acid molecules and 

25 anti-SECP antibodies. These modulatory methods can be performed in vitro {e.g., by culturing 
the cell with the agent) or, alternatively, in vivo {e.g., by administering the agent to a subject). 
As such, the invention provides methods of treating an individual afflicted with a disease or 
disorder characterized by aberrant expression or activity of a SECP protein or nucleic acid 
molecule. In one embodiment, the method involves administering an agent (e.g., an agent 

30 identified by a screening assay described herein), or combination of agents that modulates (e.g., 
up-regulates or down-regulates) SECP expression or activity. In another embodiment, the 
method involves administering a SECP protein or nucleic acid molecule as therapy to 
compensate for reduced or aberrant SECP expression or activity. 
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Stimulation of SECP activity is desirable in situations in which SECP is abnormally 
down-regulated and/or in which increased SECP activity is likely to have a beneficial effect. 
One example of such a situation is where a subject has a disorder characterized by aberrant cell 
proliferation and/or differentiation {e.g., cancer or immune associated disorders). Another 
5 example of such a situation is where the subject has a gestational disease (e.g., pre-clampsia). 

Determination of tlie Biological Effect of tiic Therapeutic 

In various embodiments of the invention, suitable in vitro or in vivo assays are performed 
to determine the effect of a specific Therapeutic and whether its administration is indicated for 
treatment of the affected tissue. 

10 In various specific embodiments, in vitro assays may be performed with representative 

cells of the type(s) involved in the patient's disorder, to determine if a given Therapeutic exerts 
the desired effect upon the cell type(s). Compounds for use in therapy may be tested in suitable 
animal model systems including, but not limited to rats, mice, chicken, cows, monkeys, rabbits, 
and the like, prior to testing in human subjects. Similarly, for in vivo testing, any of the animal 

15 model system known in the art may be used prior to administration to human subjects. 

Prophylactic and Therapeutic Uses of the Compositions of the Invention 

The SECP nucleic acids and proteins of the invention may be useful in a variety of 
potential prophylactic and therapeutic applications. By way of a non-limiting example, a cDNA 
encoding the SECP protein of the invention may be useful in gene therapy, and the protein may 
20 be useful when administered to a subject in need thereof 

Both the novel nucleic acids encoding the SECP proteins, and the SECP proteins of the 
invention, or firagments thereof, may also be useful in diagnostic applications, wherein the 
presence or amount of the nucleic acid or the protein are to be assessed. These materials are 
further useful in the generation of antibodies which immunospecifically-bind to the novel 
25 substances of the invention for use in therapeutic or diagnostic methods. 

The invention will be further illustrated in the following non-limiting examples. 



Example 1: Radiation Hybrid Mapping Provides the Chromosomal 
Location of SECP 2 (Clone 11618130.0.27) 

30 ' Radiation hybrid mapping using human chromosome markers was carried out to 

determine the chromosomal location of a SECP2 nuclei acid of the invention. The procedure 

used to obtain these results is described generally in Steen, et aL, 1999. A High-Density 
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Integrated Genetic Linkage and Radiation Hybrid Map of the Laboratory Rat, Genome Res. 9: 
AP1-AP8 (Published Online on May 21, 1999). A panel of 93 cell clones containing randomized 
radiation-induced human chromosomal fragments was then screened in 96 well plates using PCR 
primers designed to identify the sought clones in a unique fashion. Clone 1 1618130.0.27, a 
SECP2 nucleic acid was located on chromosome 16 at a map distance of 26.0 cR from marker 
WI-3768 and -70.5 cR from marker TIGR-A002K05. 

Example 2: Molecular Cloning of Clone 11618130 

Oligonucleotide PCR primers were designed to amplify a DNA segment coding for the 
full length open reading frame of clone 116181 30. The forward primer included a Bgl II 
restriction site and the consensus Kozak sequence CCACC. The reverse primer contained an 
in-frame Xhol restriction site. Both primers contained a CTCGTC 5 '-terminus clamp. The 
nucleotide sequences of the primers were: 

11618130 Forward Primer: 

CTCGTCAGATCTCCACCATGAGTGATGAGGACAGCTGTGTAG (SEQ ID NO: 1 9) 

11618130 Reverse Primer: 

CTCGTCCTCGAGGCAGCTGGTTGGTTGGCTTATGTTG (SEQ ID NO:20) 

The PCR reactions included: 5 ng human fetal brain cDNA template; 1 \xM of each of the 
1161 81 30 Forward and 11 61 81 30 Reverse primers; 5 \xM dNTP (Clontech Laboratories; Palo 
Alto, CA) and 1 ^1 of 50x Advantage-HF 2 polymerase (Clontech Laboratories; Palo Alto, CA) 
in 50 \x\ total reaction volume. The following PCR conditions were used: 

a) 96''C 3 minutes 

b) 96°C 30 seconds denaturation 

c) 70°C 30 seconds, primer annealing. This temperature was gradually decreased 
by rc/cycle 

d) 72°C 1 minute extension. 
Repeat steps b-d a total of 10-times 

e) 96^C 30 seconds denaturation 

f) 60°C 30 seconds annealing 

g) 72**C 1 minute extension 
Repeat steps e-g a total of 25-times 

h) 72**C 5 minutes final extension 
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A single, amplified product of approximately 800 bp was detected by agarose gel 
electrophoresis. The PGR amplification product was then isolated by the QIAEX II® Gel 
Extraction System (QIAGEN, Inc; Valencia, CA) in a final volume of 20 jil. 

A total of 10 III of the isolated fragment was digested with Bgl II and Xhol restriction 
5 enzymes, and ligated into the BamHI- and Xhol-digested mammalian expression vector 

pCDNA3.1 V5His (Invitrogen; Carlsbad, CA.). The construct was sequenced, and the cloned 
insert was verified as a sequence identical to the ORF coding for the full length 1 1618130. The 
construct was designated pcDNA3. 1-116181 30-S 178-2. 

Example 3: Expression of 11618130 In Human Embryonic Kidney 293 Cells 

1 0 The vector pcDN A3 . 1 - 1 1 6 1 8 1 30-S 1 78-2 described in Example 2 was subsequently 

transfected into human embryonic kidney 293 cells (ATCC No. CRL-1573; Manassas, VA) 
using the LipofectaminePlus Reagent following the manufacturer's instructions (Gibco/BRL/Life 
Technologies; Rockville, MD) The cell pellet and supernatant were harvested 72 hours after 
transfection, and examined for 1 1618130 expression by use of SDS-PAGE under reducing 

15 conditions and Western blotting with an anti-V5 antibody. FIG. 12 shows that 1 1618130 was 
expressed as a protein having an apparent molecular weight (Mr) of approximately 34 kilo 
Daltons (kDa) which was intracellularly expressed in the 293 cells. These experimental results 
were consistent with the predicted molecular weight of 28043 Daltons for the protein of clone 
1 1618130.0.27 and with the predicted localization of the protein intracellularly in the microbody 

20 (peroxisome). A second band of approximately 54 kDa was also found, which may represent a 
non-reducible dimer of this protein. 

Example 4: Preparation of Mammalian Expression Vector pSecVSHis 

The oligonucleotide primers, pSec-V5-His Forward and pSec-V5-His Reverse, were 
generated to amplify a fragment from the pcDNA3.1-V5His (Invitrogen; Carlsbad, CA) 
25 expression vector that includes V5 and His6. The nucleotide sequences of these primers were: 

pSec-V5-His Forward Primer: 

CTCGTCCTCGAGGGTAAGCCTATCCCTAAC (SEQ ID NO:2 1 ) 

pSec-V5-His Reverse Primer: 

CTCGTCGGGCCCCTGATCAGCGGGTTTAAAC (SEQ ID NO:22) 

30 The PGR product was digested with Xhol and Apal, and ligated into the Xhol/Apal- 

digested pSecTag2 B vector harboring an Ig kappa leader sequence (Invitrogen; Carlsbad, CA). 
The correct structure of the resulting vector (designated pSecV5His), including an in-frame 
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Ig-kappa leader and V5-His6, was verified by DNA sequence analysis. The pSecVSHis vector 
included an in-frame Ig kappa leader, a site for insertion of a clone of interest, V5 and His6, 
which allows heterologous protein expression and secretion by fusing any protein to the Ig kappa 
chain signal peptide. Detection and purification of the expressed protein was aided by the 
5 presence of the V5 epitope tag and 6x His tag at the carboxyl-terminus (Invitrogen; Carlsbad, 
CA). 

Example 5: Molecular Cloning of 16406477 

Oligonucleotide PCR primers were designed to amplify a DNA segment encoding for the 
mature form of clone 16406477 from amino acid residues 38 to 385, recognition of the signal 
10 sequence predicted for this polypeptide. The forward primer contained an in-frame BamHI 

restriction site and the reverse primer contained an in-frame Xhol restriction site. Both primers 
contained the CTCGTC 5' clamp. The sequences of the primers were as follows: 

16406477 Forward Primer: 

CTCGTCGGATCCTGGGGCGCAGGGGAAGCCCCGGG (SEQ ID NO:23) 

15 16406477 Reverse Primer: 

CTCGTCCTCGAGGAGGGC AGC AAGGAGGCTGAGGGGC AG (SEQ ID NO : 24) 

The PCR reactions contained: 5 ng human fetal brain cDNA template; 1 |aM of each of 
the 16406477 Forward and 16406477 Reverse Primers; 5 fiM dNTP (Clontech Laboratories; 
Palo Alto, CA) and 1 ^1 of 50x Advantage-HF 2 polymerase (Clontech Laboratories; Palo Alto, 
20 CA) in a 50 |a1 total reaction volume. PCR was then conducted using reaction conditions 
identical to those previously described in Example 2. 

A single, amplified product of approximately 1 Kbp was detected by agarose gel 
electrophoresis. The product was then isolated by QIAEX II® Gel Extraction System 
(QUIAGEN, Inc; Valencia, CA) in a total reaction volume of 20 fxl. 

25 A total of 10 1x1 of the isolated fragment was digested with BamHI and Xhol restriction 

enzymes, and ligated into the pSecV5-His mammalian expression vector {see, Example 4) which 
had been previously-digested with BamHI and Xhol. The construct was sequenced, and the 
cloned insert was verified as possessing a sequence identical to that of the ORF coding for the 
mature fragment of clone 16406477. The construct was subsequently designated pSecV5His- 

30 16406477-S196-A. 

Example 6: Expression of 16406477 in Human Embryonic Kidney 293 Cells 
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The pSecV5His-16406477-S196-A construct (^see. Example 5) was subsequently 
transfected into 293 cells (ATCC No. CRL-1573; Manassas, VA) using the LipofectaminePlus 
Reagent following the manufacturer's instructions (Gibco/BRL/Life Technologies). The cell 
pellet and supernatant were harvested 72 hours after transfection, and examined for 16406477 
5 expression by use of SDS-PAGE under reducing conditions and Western blotting with an anti- 
V5 antibody. FIG. 13 demonstrates that 16406477 is expressed as a protein having an apparent 
molecular weight (Mr) of approximately 45 kDa which is retained intracellularly in the 293 cells. 
The Mr value which was found upon expression of the clone is consistent with the predicted 
molecular weight of 43087 Daltons. 

1 0 Example 7: Quantitative Tissue Expression Analysis of Clones of the Invention 

The Quantitative Expression Analysis of several clones of the invention was preformed in 
41 normal and 55 tumor samples {see, FIG. 14) by real-time quantitative PCR (TAQMAN®) by 
use of a Perkin-Elmer Biosystems ABI PRISM® 7700 Sequence Detection System. The 
following abbreviations are used in FIG. 14: 

15 ca. = carcinoma, 

* = established from metastasis, 

met = metastasis, 

s cell var= small cell variant, 

non-s = non-sm =non-small, 
20 squam = squamous, 

pi. eff = pi effusion = pleural effusion, 

glio = glioma, 

astro = astrocytoma, and 

neuro = neuroblastoma. 

25 Initially, 96 RNA samples were normalized to P-actin and GAPDH. RNA (~50 ng total 

or ~1 ng poly(A)+) was converted to cDNA using the TAQMAN® Reverse Transcription 
Reagents Kit (PE Biosystems; Foster City, CA; Catalog No. N808-0234) and random hexamers 
according to the manufacturer's protocol. Reactions were performed in a 20 \>\ total volume, 
and incubated for 30 minutes at 48^C. cDNA (5 |la1) was then transferred to a separate plate for 

30 the TAQMAN® reaction using p-actin and GAPDH TAQMAN® Assay Reagents (PE 

Biosystems; Catalog Nos. 43 1088 IE and 4310884E, respectively) and TAQMAN^ Universal 
PCR Master Mix (PE Biosystems; Catalog No. 4304447) according to the manufacturer's 
protocol. Reactions were performed in a 25 ^1 total volume using the following parameters: 
2 minutes at 50*^C; 10 minutes at 95**C; 15 seconds at 95**C/1 min. at 60^C (40 cycles total). 

35 Results were recorded as CT values {i.e., cycle at which a given sample crosses a 

threshold level of fluorescence) using a log scale, with the difference in RNA concentration 



between a given sample and the sample with the lowest CT value being represented as 2^. The 
percent relative expression is then obtained by taking the reciprocal of this RNA difference and 
multiplying by 100. The average CT values obtained for p-actin and GAPDH were used to 
normalize RNA samples. The RNA sample generating the highest CT value required no further 
5 diluting, while all other samples were diluted relative to this sample according to their P-actin 
/GAPDH average CT values. 

Normalized RNA (5 ^1) was converted to cDNA and analyzed via TAQMAN^ using One 
Step RT-PCR Master Mix Reagents (PE Biosystems; Catalog No. 4309169) and gene-specific 
primers according to the manufacturer's instructions. Probes and primers were designed for each 

10 assay according to Perkin Elmer Biosystem's Primer Express Software package (Version I for 
Apple Computer's Macintosh Power PC) using the sequence of the respective clones as input. 
Default settings were used for reaction conditions and the following parameters were set before 
selecting primers: primer concentration = 250 nM; primer melting temperature (T^) range = 58"*- 
60** C; primer optimal Tm = 59"* C; maximum primer difference = T C, probe does not posses a 

15 5'-terminus G; probe T^ must be 10** C greater than primer T^; and amplicon size 75 bp to 100 
bp in length. The probes and primers were synthesized by Synthegen (Houston, TX). Probes 
were double-purified by HPLC to remove uncoupled dye and then evaluated by mass 
spectroscopy to verify coupling of reporter and quencher dyes to the 5'- and 3 '-termini of the 
probe, respectively. Their final concentrations used were - Forward and Reverse Primers = 900 

20 nM each; and probe = 200nM. 

Subsequent PCR conditions were as follows. Normalized RNA from each tissue and 
each cell line was spotted in each well of a 96 well PCR plate (Perkin Elmer Biosystems). PCR 
reaction mixes, including two probes {le., SECP-specific and another gene-specific probe 
multiplexed with the SEPC-specific probe) were set up using Ix TaqMan™ PCR Master Mix for 
25 the PE Biosystems 7700, with 5 mM MgCh; dNTPs (dA, G, C, U at 1 : 1 : 1 :2 ratios); 0.25 U/ml 
AmpliTaq Gold^"" (PE Biosystems); 0.4 U/|al RNase inhibitor; and 0.25 U/|li1 Reverse 
Transcriptase. Reverse transcription was then performed at 48**C for 30 minutes, followed by 
amplification/PCR cycles as follows: 95**C 10 minuets, then 40 cycles of 95** C for 15 seconds, 
and 60°C for 1 minute. 

30 The primer-probe sets employed in the expression analysis of each clone, and a summary 

of the results, are provided below. The complete experimental results are illustrated in FIG. 14. 
The panel of cell lines employed was identical in all cases except that samples 95 and 96 were 
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gDNA and a melanoma UACC-257 (control), respectively, in the experiments for clone 
1 1696905. The nucleotide sequences of the primer sets used for these clones are as follows: 

Clone 11696905.0.47 Primer Set: 

Ag 383 (F): 5 ' -ggcctctccgtacccttctc-3 • (SEQ ID NO:25) 

5 Ag 383(R): 5 ' -agaggctcttggcgcagtt-3 • (SEQIDNO:26) 

Ag 383 (P): TET-5 • -accaggatcacgacctccgcagg-3 • -TAMRA (SEQ ID NO:27) 

Primer Set Ag 383 was designed to probe for nucleotides 403-478 in SEPC 3 (clone 
1 1696905.0.47). The results indicate that the clone was prominently expressed in normal cells 
such as adipose, adrenal gland, various regions of the brain, skeletal muscle, bladder, liver and 

10 fetal liver, mammary gland, placenta, prostate and testis. It was also found to be expressed at 
levels much higher than comparable normal cells in cancers of the kidney and lung, and 
expressed at levels much lower than comparable normal cells in cancers of the central nervous 
system (CNS) and breast. These results suggest that SEPC 3 (clone 1 1696905.0.47), or 
fragments thereof, may be useful in probing for cancer in kidney and lung, and that the nucleic 

15 acid or the protein of clone 1 1696905.0.47 may be a target for therapeutic agents in such cancers. 
These nucleic acids and proteins may be useful as therapeutic agents in treating cancers of the 
CNS and breast. 

Clone 16406477.0.206 Primer Set: 

Ag 53 (F): 5 ' -gcctggcacggactatgtgt-3 • (SEQ ID NO:28) 

20 Ag 53 (R): 5 • -gccgtcagccttggaaagt-3 • (SEQ ID NO:29) 

Ag 53 (P): TET-5 ' -ccattcccgctgcactgtgacg-3 ' -TAMRA (SEQ ID NO:30) 

SEPC 7 (clone 16406477.0.206) was found to be expressed essentially exclusively in 
testis cells, with a low level of expression in the hypothalamus, among the cells tested. 

Clone 21433858 Primer Set: 

25 Ag 127 (F): 5 ■ -cctgccaggatgactgtcaatt-3 • (SEQ ID NO:3 1) 

Ag 127 (R): 5 ' -TGGTCCT7yvCTGCACCACAGTCT-3 • (SEQ ID NO:32) 

Ag 127 (P): TET-5 • -CCAGCTGGTCCAAGTTTTCTTCATGCAA-3 • -TAMRA (SEQ ID NO:33) 

Probe set Ag 127 targets nucleotides 2524-2601 of SECPl (clone 21433858). The results 

show that the clone is expressed principally in normal tissues such as adipose, brain, bladder, 

30 fetal and adult kidney, mammary gland, myometrium, uterus, placenta, and testis. In comparison 

to normal lung tissue, it is highly expressed in a small cell lung cancer, a large cell lung cancer, 

and a non-small cell lung cancer. Therefore, SECPl (clone 21433858), or a fragment thereof, 

may be useful as a diagnostic probe for such lung cancers. The nucleic acids or proteins of 
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SECPl (clone 21433858) may furthermore serve as targets for the treatment of cancer in these 
and other tissues. 

Clone 21637262.0.64 Primer Set: 

Ab5(F): 5 ' -GTGATCCTCAGGCTGGACCA-3 ' (SEQ ID NO:34) 

5 Ab5(R): 5 • -ttctgactgggctgcatcc-3 • (SEQ ID NO:35) 

Ab5(P): FAM-5 ' -ccagtgtttcctcagcacagggcc-3 • -TAMRA (SEQ ID NO:36) 

Probe set Ab5 targets nucleotides 1221-1298 in SECP9 (clone 21637262.0.64). The 
results shown in FIG. 14 demonstrate that SECP9 (clone 21637262.0.64) is expressed in cells 
from normal tissues including, especially, the salivary gland and trachea, among those cells 
10 examined. 

Table ??. Probe and Primer Set: Ag 815 for CG106318_01 



Primers 


Sequences 


TM 


Length 


Start Position 


SEQ ID 
NO 


Forward 


5'-TGTGCTCAGCACATGGTCTA-3 ' 


59 


20 


1722 


37 


Probe 


FAM-5'- 

ACACCTGCTCAGGGAAAACGACAGAA- 
3 • -TAMRA 


69.9 


26 


1760 


38 


Reverse 


5'-TCGTGCTCGTATCTGTTTCC-3 ' 


58.9 


20 


1787 


39 



1 S Other Embodiments 

While the invention has been described in conjunction with the detailed description 
thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, 
which is defined by the scope of the appended claims. Other aspects, advantages, and 
modifications are within the scope of the following claims. 
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